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1.0  INTRODUCTION 

In  this  report,  we  present  methods  for  estimating  the  parameters  in  models 
of  the  form 


x(£,a)  =  F(0)  a  ,  (1-1) 

which  has  been  called  a  separable,  reducible,  or  semilinear  model  [1] ,  [2] .  In 
this  type  of  model,  the  columns  of  the  matrix  F (0)  ,  called  the  basis  function 
matrix,  are  the  basis  vectors  that  span  the  space  of  the  model.  The  vector  0 
contains  the  parameters  that  enter  into  the  model  nonlinearly  (the  nonlinear 
parameters) ,  and  the  vector  a  contains  the  parameters  that  enter  into  the 
model  linearly  (the  linear  parameters) .  We  assume  that  we  have  a  vector,  x, 
containing  observations  of  the  model  corrupted  by  white  Gaussian  noise. 

In  estimating  0  and  a,  we  restrict  our  attention  to  maximum  likelihood 
estimation  that  (sine"  we  assume  white  Gaussian  noise)  reduces  to  least- 
squares  estimation.  That  is,  we  minimize  the  cost  functional  (or  error  norm) 


*(£>a)  =  I  ~  x(0,a) 


(1-2) 


A  well  known  simplification  of  the  least-squares  problem  takes  advantage 
of  the  structure  of  the  semilinear  model  and  replaces  Eq.  (1-2)  with  a  new 
cost  functional  called  the  Variable  Projection  Functional  (VPF) 


(1-3) 


Here  Pp-H0)  is  the  projector  onto  the  orthogonal  complement  of  the  column 
space  of  the  matrix  F(0) .  The  term  variable  projection  functional  arises 
because  the  projector  is  a  function  that  varies  with  the  nonlinear  parameters, 
unlike  the  fixed  projector  typically  encountered  in  the  linear  least-squares 
problem  (Figure  1-1  demonstrates  the  operation  of  a  fixed  projector) .  For 
those  readers  unfamiliar  with  the  VPF,  we  provide  tutorial  material  relating 
to  the  derivation  of  the  VPF  and  to  existing  methods  for  minimizing  the  VPF. 
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Fig,  1-1  The  effect  of  proj'»:ting  an  arbitrary  vector  y  onto  the  coluan  space 
of  thj  aatrix  F,  symbolized  by  the  plane,  and  onto  the  orthogonal  complement 
of  the  coluan  foace. 


As  noted  by  Osborr>e  [3] ,  yet  another  equivalent  cost  functional  can  be 
obtained,  if  one  has  available  some  nabrix  B(fl)  such  that 

Bfi!)  F(fi)  -  0.  (1-4) 

This  second  type  of  YPF  contains  a  projector  onto  the  row  space  of  the  matrix 
B(0) ,  and  is  written 


2 

m  -  ||  sfvfi)  x  ||  •  a-5) 

In  addition  to  the  tutorial  material  presented,  we  provide  a  detailed 
discusion  of  Newton  algorithms  that  use  an  exact  Hessian  matrix  for  each  of 
the  VPF’s.  All  algorithms  presented  in  this  report  are  given  in  terms  of 
QR-based  orthogonal  factorizations  of  matrices  and  are  implementable  directly 
from  the  material  presented  here. 

The  performance  of  these  new  ML  algorithms  is  examined  Xu  s,  companion 
report  [4] ,  and  is  there  compared  to  previously  published  estimators .  The 
focus  here,  however,  is  on  providing  the  theoretical  mathematical  concepts  on 
which  the  algorithms  are  based,  and  on  providing  a  detailed  discussion  of  the 
algorithm  implementations . 
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As  a  vehicle  o£  discussion  for  introducing  the  concepts  and  algorithms,  we 
will  refer  to  a  specific  model,  which  is  a  signal  consisting  of  a  sum  of 
complex  sinusoids,  also  expressable  as  a  sum  of  undamped  or  damped  sinusoids. 
In  this  model,  the  nonlinear  parameters  of  the  semilinear  model  correspond  to 
the  signal  poles,  and  the  linear  parameters  are  the  amplitudes  of  each 
sinusoid. 

In  estimating  the  parameters  of  the  exponential  signal,  the  matrix  B(0)  is 
a  convolution  matrix,  which  corresponds  to  the  prediction  filter  that 
annihilates  the  ideal  signal.  Since  this  matrix  is  a  linear  function  of  the 
prediction  coefficients,  it  allows  for  significant  computational  savings  in 
the  algorithms.  Because  of  the  functional  relationship  of  B  to  the  prediction 
coefficients,  we  will  write  B(b) ,  where  b  is  a  vector  containing  the 
prediction  coefficients.  The  nonlinear  parameters  in  0  can  then  be  obtained 
as  the  roots  of  the  polynomial  whose  coefficients  are  the  prediction 
coefficients. 

To  distinguish  between  the  two  VPF’s  throughout  this  report,  we  will  refer 
to  the  VPF  containing  the  column  space  projector  as  the  signal  basis  VPF, 
while  we  will  refer  to  the  VPF  containing  the  row  space  projector  as  the 
prediction  filter  VPF.  In  the  main  body  of  this  report,  we  develop  two 
parallel  paths.  One  describes  theory  and  algorithms  for  obtaining  estimates 
of  the  signal  poles  directly  via  minimization  of  a  signal  basis  VPF.  The 
other  describes  how  to  obtain  pole  estimates  indirectly  by  first  providing 
estimates  of  the  prediction  filter  coefficients  via  minimization  of  a 
prediction  filter  YPF  and  then  transforming  the  prediction  coefficients  into 
pole  estimates.  In  both  cases,  the  amplitudes  are  obtained  by  solving  a 
linear  least-squares  problem  in  which  the  basis  function  matrix  is  constructed 
from  the  ML  pole  estimates. 


1 . 1  New  Material 

While  the  application  of  variable  projection  techniques  to  estimating  the 
parameters  for  exponential  signals  in  noise  is  not  new,  several  aspects  of  the 
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problem  as  considered  here  have  not  been  encountered  in  the  literature. 
Foremost,  w&  here  extend  existing  variable  projection  methods  by  deriving  the 
Hessian  matrix  for  the  general  YPF  and  by  providing  implementations  of 
Newton’s  method  for  each  of  the  two  parallel  paths. 

Furthermore,  we  constrain  the  exponential  signal  model  to  include  known 
poles.  In  each  of  the  two  parallel  paths,  these  constraints  are  introduced 
differently.  In  the  direct  pole  estimation  path,  we  use  a  technique  which  is 
similar  to  deflation  in  the  eigenvector  problem;  i.e.,  we  project  the  signal 
model  and  the  observation  vector  onto  the  orthogonal  complement  of  the 
subspace  spanned  by  the  basis  functions  that  correspond  to  the  known  poles. 

The  method  of  introducing  the  constraint  for  this  VPF  is  general  and  not 
limited  to  the  case  of  an  exponential  signal. 

In  the  prediction  coefficient  estimation  path,  we  factor  the  prediction 
filter  convolution  matrix  into  two  convolution  matrices,  one  corresponding  to 
the  known  poles  and  one  corresponding  to  the  unknown  poles.  Since  this 
factorization  is  facilitated  by  the  nature  of  the  convolution  matrix  (and  thus 
the  exponential  signal) ,  this  constraint  is  not  applicable  to  the  general 
model . 


1.2  Report  Organisation 

In  Chapters  2  and  3,  we  introduce  the  notation  that  will  be  followed 
throughout  the  report.  Chapter  2  contains  the  signal  model  and  a  discussion 
of  the  ML  estimator  for  this  signal  in  white  noise.  Chapter  3  provides  a 
numerical  link  between  the  ML  error  norm  and  the  two  different  variable 
projection  error  norms,  which  are  the  Signal  Basis  Variable  Projection 
Functional  and  the  Prediction  Filter  Variable  Projection  Functional. 

In  Chapter  4,  we  introduce  the  pole  constraints  discussed  above.  This  is 
followed,  in  Chapter  5,  by  derivations  of  the  Jacobian  matrix,  gradient 
vector,  and  Hessian  matrix  for  each  of  the  two  variable  projection  error 
norms.  In  Chapter  6,  we  discuss  orthogonal  factorization  techniques  that  are 
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key  components  the  nonlinear  optimization  schemes.  Finally,  in  Chapter  7, 
we  examine  the  implementations  of  the  various  optimization  techniques,  based 
on  the  results  of  Chapters  5  and  6. 

In  addition  to  material  listed  above,  we  provide  fairly  extensive  tutorial 
material  in  a  number  of  appendices.  In  the  first  of  these,  we  examine  key 
aspects  of  Linear  Least-Squares  (LLS)  theory  (such  as  projection  operators  and 
pseudoinverses)  and  describe  how  some  of  the  popular  orthogonal  factorizations 
are  used  in  solving  LLS  problems.  This  material  is  incl  .ded  not  only  as 
background  for  this  report  but  also  as  a  mathematical  base  for  some  of  the 
nonmaximum  likelihood  algorithms  discussed  in  the  companion  report  [4] .  In 
Appendix  B,  we  go  a  layer  deeper  in  tho  LLS  discussion  by  examining  the  use  of 
Bouseholder  reflectors  as  a  method  of  implementing  the  orthogonal 
factorizations.  Id  Appendix  C,  we  discuss  optimization  techniques  which  are 
useful  in  minimizing  least-squares  and  variable  projection  functionals. 

In  Appendices  D,  E,  and  F,  we  focus  on  the  variable  projection  material, 
parallelling  and,  hopefully,  enhancing  the  discussions  of  Golub  and  Pereyra 
[1] » ]  [5] .  Appendix  D  contains  derivations  for  the  derivatives  of  the 
projection  operators  and  pseudoinverse  necessary  to  calculate  the  Jacobian, 
gradiant,  and  Hessian.  Appendix  E  discusses  the  simplified  tensor  notation 
introduced  by  Golub  and  Pereyra,  which  centers  around  the  Frechet  derviative 
of  a  mapping.  The  drawbacks  of  this  notation  in  computing  second  derivatives 
are  discussed.  We  then  expound  upon  a  key  proof  from  [1]  that  ties  the 
variable  projection  method  to  the  straightforward  nonlinear  least-squares 
method,  and  hence  to  the  ML  method. 
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2.0  MCKGROnND 

In  this  chapter  we  introduce  the  signal  model  to  be  used  as  a  vehicle  for 
discussing  the  variable  projection  concepts  and  algorithms.  In  particular, 
the  signal  model  will  be  the  response  of  a  system  to  a  stepped  sinusoid  input, 
where  the  system  can  be  characterized  by  a  linear  constant  coefficient 
ordinary  differential  equation.  This  will  include  steady-state  terms  (whose 
frequency  corresponds  to  the  excitation)  and  several  decaying  terms  (whose 
damping  and  frequency  correspond  to  the  system  poles) .  Following  the 
introduction  of  the  signal  model,  we  will  then  briefly  discuss  the  maximum 
likelihood  estimator.  Specifically,  we  will  show  that  for  signals  in  white 
Gaussian  noise,  maximum  likelihood  estimation  simplifies  to  least-squares 
estimation. 


2.1  Signal  Model 

We  consider  the  stepped  sine  response  of  a  linear  system  modeled  as  a  sum 
of  real  exponentials  in  additive  white  Gaussian  noise, 


where 


y  =  x  (0,a)  +  e  , 

'n  nv  — '  n’ 

xn(£,a)  =  +  cos (2wf ^nT)  +  sin(2rf^nT) 

+  Cg  exp(tt2nT)  cos^rfgnT) 

+  Sg  exp(agnT)  sin(2TfgnT) 


+  0^  exp^nT)  cos(2jrfpnT) 

+  3^  exp(fltpnT)  sin(2jrfpn!)  , 


=  [  “2  -  f2  > 
*  ‘  [  °0  -  °1  •  S1 


f 

P 


] 


T 


i 


(2-1) 
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Here  T  is  sampling  interval  and  is  a  sequence  of  white  Gaussian  noise. 
Here,  f^  is  the  known  excitation  frequency,  and  the  term  Cq  is  included  to 
account  for  the  DC  biases  introduced  during  the  measurement  process. 

We  can  write  the  signal  in  vector  form  so  that 

I  =  *(£,»)  +  e,  (2-2) 


where 


1 


*  [  *0  -  *1 


and  cov(e)  =  e  eT  j  =  ffg2  1^  , 

2 

where  is  the  noise  variance,  and  1^  is  an  NXN  identity  matrix.  The  signal 
basis  functions  for  our  signal  model  are 


f  -  (0)  =  1 
n,  1 v_/ 

fQ  2(0)  =  cos(2*f ^nT) 
fn>3(£)  =  sin(2irf1nT) 
fQ  ^(£)  =  exp(a2nT)  cos(2?rf2nT) 
fn>5(fi)  =  exp(a2nT)  sin(2rf2nT) 


f  u  i  (£)  =  exp  (a  nT)  cos  (2irf  nT) 

n j M“X  p  p 

fn,M^  =  exP(apnT)  sin(2*fpnT)  f 
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where  M  =  2p+l  is  the  number  of  independent  basis  functions  required  to  span 
the  ideal  signal  space.  We  can  now  define  a  set  of  spanning  signal  basis 
vectors,  given  as 


T 

ijW  =  [  lo.jte)  ■  iltjW  ,  ■■■  ,  ]  ,  i  =  . . 


Finally,  we  define  the  signal  basis  function  matrix 


F(£)  =  [  4(£)  ,  i2(£)  ,  ...  ,  %(£)  ]  , 

whose  columns  each  correspond  to  one  of  the  basis  functions  and  whose  rows 
each  correspond  to  an  instant  of  time  in  the  observation  window.  We  can  now 
write  x(£,a)  in  the  separable  form  (as  the  matrix-vector  product) 


x(£> a)  =  F (£)  a  . 


Our  goal  is  to  estimate  the  parameter  vectors  0  and  a. 


(2-3) 


2.2  Maximum  Likelihood  Estimation  for  Signals  in  White  Noise 

In  maximum  likelihood  estimation,  we  wish  to  find  the  parameters  that 
maximize  the  probability  density  of  the  observed  data  given  the  unknown 
parameters.  Letting  0  and  a  be  the  collection  of  unknown  parameters,  we  wish 
to  find  0  and  a  which  maximize  the  conditional  density  function  (the 
likelihood  function) 


0* ,a*  =  arg  min  p(yQ,y1, • • . ,yN-1 |£,a)  . 

For  independently  and  identically  distributed  Gaussian  noise,  the  likelihood 
function  is 


p(y0,y1,...,yN_1l£,a) 


1 

r_  i  r 

(2TO£2  ' 

N/2 

c 

n 


(2-4) 
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Since  the  logarithm  is  a  monotonic  function,  maximizing  the  logarithm  of 
the  likelihood  function  yields  the  same  result  as  maximizing  the  likelihood 
function  itself. 

We  may  therefore  optimize  the  log  likelihood  function  as 
L(£,a)  =  In  {  p(yQ,  yv  ...,  yN_1l£,a)  ) 


— r  M2*)  -  n  -  r-2  Z  ( yn  -  *„(£,») )  •  <2-5) 

2 a  c  n=0 

Only  the  last  of  the  terms  in  Eq.  (2-5)  contains  £  and  a,  and  it  appears  in 
the  expression  with  a  negative  sign.  Therefore,  since  the  noise  variance  is 
assumed  to  be  known,  the  parameters  that  maximize  the  likelihood  function  are 
those  which  minimize 


yn- 


which  is  just  the  least-squares  functional 


*(£>a)  =  | 

'rtl 

SJ 

XI 

i 

H 

=  | 

i  -  F(4)  a  | | 

(2-6) 
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3.0  VARIABLE  PROJECTION  ERROR  NORMS 

As  we  have  just  seen,  the  Maximum  Likelihood  (ML)  estimator  reduces  to  the 
Least-Squares  (LS)  estimator  when  the  ideal  signal  is  corrupted  with  additive 
white  Gaussian  noise.  Here,  we  obtain  further  simplifications  by  noting  the 
structure  of  the  ideal  signal  and  introduce  two  modified  error  norms,  or 
variable  projection  functionals. 

We  begin  by  stating  an  observation  by  Golub  and  Pereyra  [1]  concerning 
separable  least-squares  problems,  for  which  one  can  first  optimize  a  reduced 
set  of  parameters  and  then  find  the  remaining  parameters  as  a  function  of  the 
first  set.  This  leads  to  the  basis  function  Variable  Projection  Functional 
(VPF)  which,  in  our  case,  allows  us  to  find  ML  estimates  of  the  signal  poles 
independently  of  the  weighting  coefficients  (amplitudes) . 

From  the  signal  basis  VPF,  we  follow  Kumaresan,  Scharf,  and  Shaw  [6],  and 
Bressler  and  Macovski  [7] ,  and  introduce  the  deterministic  functional 
relationship  between  the  signal  poles  and  the  prediction  filter  coefficients 
for  the  filter  that  annihilates  the  ideal  signal.  We  can  then  define  a  VPF 
based  on  the  prediction  filter  coefficients  and  show  that  this  prediction 
filter  VPF  is  equivalent  to  the  basis  function  VPF,  allowing  us  to  optimize 
with  respect  to  the  prediction  coefficients. 


3.1  Signal  Basis  Variable  Projection  Functional 

As  shown  in  the  previous  chapter,  ML  estimation  of  parameters  0  and  a  can 
be  achieved  by  minimizing  the  nonlinear  least-squares  error  norm;  i.e.,  by 
finding 

9* , a*  =  arg  min  *(£,a) 

£,  a 

=  arg  min  j I  e(0, a)  1  I  ,  (3-1) 

0,  a  1  1  " 

where  the  error  vector  e(0, a)  is  defined  as  e(0,a)  y;  -  F($)  a. 
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One  approach  to  finding  the  optimal  0  and  a  consists  of  a  three-step 
process : 

1.  Minimize  with  respect  to  a  in  order  to  obtain  the  optimal  function 

a(0)  =  arg  min  *(0,  a).  (3-2) 

a 

2.  Minimize 

m  =  *{£>*(£)}  (3-3) 

with  respect  to  0  to  yield 

0*  =  arg  min  f (0) .  (3-4) 

6 

3.  Calculate  the  optimal  numerical  values  for  a  as 

a*  =  a(0*) .  (3-5) 

It  has  been  shown  by  Golub  and  Pereyra  [1]  that,  for  separable  signal 

models,  this  multi-step  optimization  process  yields  the  same  values  of  0  and 
£ 

a  as  does  simultaneous  optimization  of  ^(0, a)  in  both  0  and  a.  In  Appendix 
F,  we  summarize  their  proof. 

We  now  obtain  a (0)  by  differentiating  the  error  functional  ^(0,a.)  with 
respect  to  a  and  equating  the  resulting  expression  to  zero. 
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h  =  fe  II  II 

T 

=  {  (  £(£,a)  ]  e(£, a)  ) 

=  2  fc  {  (  s(2,s)  )  } 

-  2  k  {  (  *  -  *  f  }  (  X  -  F(ffl  a  ) 


=  -  2  ft(M  ( I  -  F(ffl  a  }  (3-6) 

Equating  this  to  zero  yields 

FT(£)  F (6)  a  =  FT(0)  z  ,  (3-7) 

•which  has  the  solution 

a(0)  =  (  FT(0)  F(£)  )  1  FT(£)  y  (3-8) 

and  a  (£)  =  F+(£)  z  ,  (3-9) 

where  F+(£)  is  the  pseudoinverse  of  F(£) .  Therefore,  f(6)  can  be  written  as 

t(i)  =  ||  1  ~  F(£)  F+(£)  z||2-  (3-10) 


Noting  that  this  is  an  expression  for  the  projection  of  y  onto  the  orthogonal 
complement  of  the  range  (or  column  space)  of  F,  it  is  an  expression  of  the 
variable  projection  functional 

JL  3 

m "  ||  pk  ©  1 1|  •  (3-id 
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3.2  Prediction  Filter  Variable  Projection  Functional 

Given  the  s-plane  poles  for  an  exponential  signal  of  order  M;  i.e., 

si  =  ai  + 

then  the  z-plane  poles  are  calculated  as 

z^  =  exp{s^T},  i=l,2,...,M. 

The  z-transform  of  the  ideal  signal  will  thus  contain  the  following  polynomial 
in  the  denominator 

B(z)  =  (z-z1)(z-z2)»«»(z-zM) 

=  z^  +  b^z^  ^  +  bgZ^  ^  +  •••  +  b^_^z  +  bjj  .  (3-12) 

The  ideal  signal  therefore  satisfies  the  homogeneous  difference  equation 

Xi+M  *  bl  Xi+M-1  +  b2  xi+M-2  +  *"  +  bU  xi  “  0  > 
alternatively  written 

bM  Xi-M  +  bU-l  Xi-M+I  +  •"  *  bl  xi-l  +  xi  "  0  '  <3-14> 

We  can  view  the  first  M  terms  in  this  difference  equation  as  forming  an 
estimate  (prediction)  of  the  present  signal  sample  based  on  the  last  M 
samples.  The  coefficents  are,  for  this  reason,  often  called  prediction 

coefficients. 
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Since  the  above  difference  equations  represent  a  convolution  betw  '.en  the 
prediction  coefficient  vector  and  the  signal  vector,  we  can  now  define  the 
(N-M)XN  convolution  matrix 


such  that  the  rows  of  B  annihilate  the  ideal  signal  x,  and  such  that 

B  x  =  0.  (3-15) 

Also,  since  each  column  of  the  basis  function  matrix  F  also  satisfies  the 
same  homogeneous  difference  equation,  F  is  also  annihilated  by  the  rows  of  B, 
and  we  have 

B  F  =  0.  (3-16) 

Since  matrix  B  has  full  row  rank,  its  rows  span  an  (N-M) -dimensional 
N 

subspace  in  R  .  Similarly,  since  matrix  F  has  full  column  rank,  its  columns 

N 

span  an  M-dimensional  subspace  in  R  .  Because  the  rows  of  B  must  be 

orthogonal  to  the  columns  of  F  (as  dictated  by  the  equation  BF  =  0) ,  and 

because  the  dimensions  of  the  respective  subspaces  sum  to  the  dimension  of  the 
N 

vector  space  R  ,  the  row  space  of  B  and  the  column  space  of  F  must  be 
orthogonal  complements  of  each  other. 


Since  the  row  space  of  B  is  the  same  as  the  orthogonal  complement  of  the 
column  space  of  F,  then  the  projector  onto  the  row  space  of  B  is  the  same  as 
the  projector  onto  the  orthogonal  complement  of  the  column  space  of  F:  i.e., 


(3-17) 
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and  ire  can  define  an  error  norm  whose  only  unknown  parameters  are  the 
prediction  coefficients  b.  This  error  norm  is  defined  as 

2 

m  - 1|  jPi  ||  •  (3-18) 

Since  this  error  norm  follows  deterministically  from  the  ML  error  norm  for  the 
signal  poles,  we  may  note  the  invariance  principle  for  the  ML  estimator  and 
optimize  this  functional  to  obtain  ML  estimates  of  the  prediction 
coefficients.  From  here,  we  can  then  find  the  roots  of  the  prediction 
polynomial  and  transform  these  roots  from  the  z-plane  to  the  s-plane  to  get 
our  ML  pole  estimates. 
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4.0  SIGNAL  POLK  CONSTRAINTS 

Since  our  signal  model  is  to  describe  the  stepped  sine  response  -f  a 
linear  system,  we  constrain  the  model  to  contain  an  undamped  pole  ..  the  known 
excitation  frequency.  Also,  because  we  expect  there  to  be  some  bias 
introduced  into  the  observed  sign  1  during  the  observation  (measurement) 
process,  the  model  should  include  a  DC  pole  to  offset  the  bias. 

In  the  case  of  the  basis  f**.  cbion  VPF,  since  each  pole  corresponds  to  a 
distinct  set  of  basis  functions  (a  single  decaying  exponential  or  a  pair  of 
damped  or  undamped  sinusoids) ,  the  pole  constraints  translate  into  constraints 
on  the  individual  basis  functions.  We  can  introduce  these  basis  function 
constraints  into  the  optimization  process  by  deflating  the  error  space;  i.e., 
by  projecting  the  error  vector  onto  the  orthogonal  complement  of  the  subspace 
spanned  by  the  known  basis  functions.  In  the  first  section  of  this  chapter, 
we  develop  the  theory  behind  this  deflation  process  and  present  a  modified 
basis  function  VPF. 

In  the  case  of  the  prediction  filter  VPF,  we  deal  only  indirectly  with  the 
pole  parameters,  opt  .mizing  instead  over  the  prediction  polynomial 
coefficients.  To  introduce  the  pole  constraints  into  the  coefficient 
optimization  process,  we  factor  the  convolution  matrix  B  into  two  separate 
convolution  matrices:  one  resulting  from  the  known  poles,  and  one  resulting 
from  the  unknown  poles. 


4.1  Deflation  of  the  Least-Squares  Error  Space 

Recall  the  least-squares  error  norm  for  our  signal  model  and  noise;  i.e., 

2 

*(£.&)  =  !!  I  -  F(g)  a  ||  . 


Let  us  partition  the  basis  function  matrix  as 

m  -  [  F, :  *2(®  ]  . 


(4-1) 
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■where  F^  contains  only  the  known  constant  basis  functions  (and  is  therefore 
not  a  function  of  0)  and  Fg  UD  contains  the  basis  functions  corresponding  to 
the  unknown  poles. 

If  we  let  M^  be  the  number  of  constant  basis  functions  and  Mg  be  the 
number  of  unknown  basis  functions,  then  we  can  partition  the  amplitude  vector 
a  as 


a 


(4-2) 


The  M^-vector  a^  contains  the  weighting  coefficients  for  the  known  basis 
functions,  and  the  Mg-vector  ag  contains  the  weighting  coefficints  for  the 
unknown  basis  functions.  Our  signal  model  can  then  be  written  as  the  sum  of 
two  matrix  vector  products;  i.e., 


x(0,a)  =  F(0)  a 


=  Fx  ax  +  Fg(£)  ag  .  (4-3) 

The  estimation  error  vector  is  then 

=  ~  ^*i  — i  “  ^2^0  ^2  >  (4-4) 

and  the  least-squares  error  norm  is 

2 

*(£>%, ag)  =  ||  £(0,3^, ag)  ||  .  (4-5) 

To  take  advantage  of  this  signal  structure,  we  first  note  that — given  two 
orthogonal  vectors  e^  and  eg — we  can  decompose  a  least-squares  error  norm 
consisting  of  l..e  sum  of  and  Sg  as 


ii  i i 2  i 

1  1  |  2  1 

1  I  1  2 

11 

1  %  II  *  1 

1  %  ||  • 

(4-6) 
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Therefore,  given  an  arbitrary  N-vector  e,  we  can  decompose  the  squared  norm  of 
e  into  the  sum  of  two  squared  norms  by  projecting  e  onto  orthogonal  subspaces. 
Let  be  the  projector  onto  the  subspace  spanned  by  the  columns  of  the  matrix 
F^.  Then  the  projector  onto  the  orthogonal  complement  of  this  subspace  is 
given  by 


'i  ■  1  -  pi  • 


(4-7) 


Noting  that  I  =  P^+  P^  ,  we  can  then  write 


1  m2 


Pf  e  +  P^  e 


'l«ll  ♦  IK  e 


X  II2 


(4-8) 


Now,  recalling  the  definition  of  the  error  vector  e,  we  get 


< 

xte.a, ,%)  =  |  K  (  *  -  Fi  %  -  f2®)  %  )  1 1 


P1  (  *  -  F1  h  '  F2<®  22  )  II 


-  |  P1  *  “  F1  %  ~  P!  P2®  % 


II  %  ||' 


(4-9) 


where  we  have  noted  that  P^  Fj  =  f^  and  Pj^  F^  =  0.  Noting  that  the  second 
term  in  the  error  norm  is  no  longer  a  function  of  a^,  we  decompose  the  norm  as 


(4-10) 
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where 


l 

0,%>a2)  =  II  P1  1  "  F1  %  "  P1  F2^  -2  II 


i  1  II2 

and  (->-2^  =  1 1  P1  1  "  P1  F2^  ~2  I  I  * 

Substituting  ?1  =  *1  F^  into  the  expression  for  »  we  8efc 

(£.%»%)  =  1 1  Pi  *  "  F1  %  "  F1  Fl+  F2(-}  -2  1 1 

2 

=  I  i  Pl  I  -  pl  (  %  +  V  P2®  %  }  1 1 


(4-11) 


where  b  =  ^  +  F1+  F2(g)  S2  .  We  can  minimize  ^  (£>%»%)  to  zero  by  solving 
the  consistent  system  of  equations 


?!  b  =  Pj  Y 


(4-12) 


to  yield 


(4-13) 
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+  +  + 

Here  we  have  noted  that  F^  =  F^  .  Clearly,  then  (given  estimates  for 

0  and  a2) ,  can  minimize^  ho  zero  by  letting 

~1  =  Fl+  (  1  “  F2^  %  )  '  (4-14) 

Given  that  we  can  minimize  ^  to  zero  for  any  estimates  of  £  and  a2,  then 
minimization  of  x  reduces  to  minimization  of 


1  1  1 1 2 
P1  *-P!  F2®  %  II  • 


(4-15) 


If  we  now  define 


1 

I  =  P1  XL 


(4-16) 


and 


1 

G(£)  =  Px  F2(£)  , 


then  the  least-squares  error  norm  becomes 


(4-17) 


2 

X2^2)  =  ||  v  -  G(£)  %  ||  .  (4-18) 

Following  the  same  arguments  as  in  the  developments  of  the  previous  chapter, 
we  can  now  optimize  with  respect  to  £  independently  of  a2  by  defining  a 
(constrained)  VPF  as 


4 

1*2(9)  =  1 1  v  -  G(9)  B*(0)  y  ||' 


i 


P,  f/n  v 

~G  -i- 


(A  1(1) 
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If  we  now  define  the  three  coefficent  vectors 


and 


T 

t  "  l  1  bl  b2  •"  hi  J  ■ 

-  =  l  1  C1  c2  ***  CM1  ]  ’ 

u  =  [  1  ut  u2  •••  ]  , 


then  we  note  that  b  can  be  obtained  as  the  vector  convolution  of  c  and  u; 
i.e. , 


b  =  u  *  c  .  (4-23) 

The  (N-M)XN  convolution  matrix  B  can  therefore  be  factored  as  the  product  of 
two  convolution  matrices :  the  (N-M) X (N-M^)  matrix  U  (whose  rows  contain  the 
elements  of  vector  u)  and  the  (N-M^)XN  matrix  C  (whose  rows  contain  the 
elements  of  vector  c) ;  i.e., 


B  =  U  C  . 


(4-24) 


Since  the  matrix  C  contains  coefficients  corresponding  to  constant  poles, 
C  itself  will  be  constant,  so  that 


9C 

-  =0  ,  i  =  1, . . . ,M  . 

3b. 

l 


Therefore,  we  have 


3B  3U 

-  =  -  C  , 

3b.  3u. 

i  i 


(4-25) 


(4-26) 


23 


AINSLEIGH  &  GEORGE 


where  we  now  differentiate  only  with  respect  to  the  unknown  parameters  in  u. 

If  we  adopt  the  operator  notation 

¥•>  ■ 

Eq.  (4-12)  then  becomes 

D.(B)  =  D.(U)  C  .  (4-28) 

It  should  be  noted  that  premultiplication  of  an  arbitrary  vector  w  by 

either  matrix  B,  U,  or  C  represents  the  full-overlap  elements  of  the 

convolution  of  w  with  the  vector  b,  u,  or  c,  respectively.  For  instance,  if  w 

is  an  N-vector,  then  the  convolution  b*w  is  an  (N+M) -vector  with  2M  edge 

effect  or  transient  elements  and  with  N-M  full-overlap  elements  corresponding 

T  T  T 

to  the  matrix-vector  product  Bw.  The  products  B  w,  1)  w,  and  C  w  also 
represent  convolutions,  but  with  both  the  full-overlap  and  edge  effect 
elements  present  and  with  the  elements  of  b,  u,  and  c  reversed. 


a* 


9u. 


(4-27) 
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5.0  DERIVATIVES  OF  THE  VARIABLE  PROJECTION  FUNCTIONALS 

In  this  chapter,  we  derive  the  partial  derivatives;  i.e.,  elements  of  the 
Jacobian  matrix,  'gradient  vector,  and  Hessian  matrix  for  the  two  variable 
projection  functionals.  We  first  parallel  Golub  and  Pereyra  [1]  in  deriving 
the  typical  column  of  the  Jacobian  matrix  and  element  of  the  gradient  vector 
of  trhe  signal  basis  VPF.  We  then  extend  previous  work  by  introducing  the 
Hessian  for  this  VPF.  .Following  this,  we  repeat  the  Jacobian,  gradient,  and 
Hessian  derivations  for  the  prediction  filter  VPF. 

Throughout  this  work,  our  notation  is  different  from  that  of  previous 
investigators  in  that  we  retain  the  index  of  differentiation  and  derive  all  of 
the  derivatives  as  partial  derivatives.  Golub  and  Pereyra  [1] ,  on  the  other 
hand,  introduced  the  Frechet  derivative  of  the  basis  function  matrix,  which  is 
an  array  of  partial  derivative  matrices  (a  simplified  view  of  a  valence  three 
tensor) ,  allowing  them  to  derive  the  gradient  without  having  to  resort  to  full 
tensor  notation.  In  Appendix  E,  we  examine  this  simplified  tensor  notation. 
When  attempting  to  form  the  Hessian  matrix  in  this  simplified  notation, 
however,  one  is  lead  to  attempt  the  multiplication  of  two  three-dimensional 
arrays,  which  is  undefined  in  the  simplified  notation.  The  problem  could  be 
overcome  by  introducing  the  full-blown  tensor  notation,  but  the  alternative 
developed  here  seemed  simpler  and  clearer.  By  retaining  the  indexing  and 
partial  derivatives,  only  the  regular  rules  of  linear  algebra  need  be 
observed.  Furthermore,  with  the  indexing  retained,  the  equations  more  nearly 
reflect  the  computer  code  necessary  for  software  implementation  of  the 
algorithms . 


5.1  Derivatives  of  the  Signal  Basis  Variable  Projection  Functional 

We  now  introduce  the  derivatives  for  the  signal  basis  VPF.  The  first  two 
subsections  are  tutorial  in  that  they  expound  upon  the  work  performed  by  Golub 
and  Pereyra  [1] .  These  draw  upon  the  derivative  of  the  column  space  projector 
described  in  Appendix  D.  The  third  subsection,  which  introduces  the  Hessian 
matrix,  presents  material  that  has  not  been  encountered  in  the  literature. 
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JACOBIAN  MATRIX 


Recall  the  definition  of  the  signal  basis  YPF 


1 m  -  II  r/i  II2- 


The  error  vector  in  this  case  is 


e(£)  =Pn  I  • 


(5-1) 


The  typical  column  of  the  Jacobian  matrix  is  therefore 


4  =  D.(s) 

-  Di(  Tr  *  ) 

■  -  W  i  • 

Substituting  Eq.  (D-4)  for  the  partial  derivative  of  the  projector,  we  get  the 
desired  expression 


4  «  -  {  Pp1  4(F)  F+  +  (F+)T  D.(FT)  p/  }  i 


=  -  p/  D.(F)  F+  i  -  (F+)T  D.(FT)  Pp1  i 


(5-2) 


GRADIENT  VECTOR 


We  derive  an  expression  for  the  gradient  by  noting  Eq.  (0-9);  i.e, 

=  2  /  ^(§)  =2  e^  J^.  Substituting  Eqs.  (5-1)  and  (5-2)  from  above, 
this  becomes 
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=  -  2  (  Ff.1  I  }T  {  Pp1  D.(F)  F*  I  ♦  (F+)T  D-d1)  Pp1  x  } 

-  -  2  XT  Pp1  »i(F)  P+  X  -  IT  Pp1  (F+)T  D£CFT)  Pp1  I  ,  (5-3) 

where,  in  the  first  term,  we  have  noted  that  the  projector  is  symmetric  and 
idempotent.  Now,  since 

Pp1  (P*)T  -  (  (F+)  Pp1  )T 

=  {  F+  (  1  -  F  F+  )  }T 
=  (  F*  -  F*  F  F*  )T  =  0  , 

the  second  term  in  Eq.  (5-3)  vanishes,  leaving  the  desired  expression  for  the 
i’th  element  of  the  gradient  of  the  signal  basis  YPF,  which  is 


»i(F)  ?  I  • 


(5-4) 


EESSIAN  MATRIX 


The  typical  element  of  the  Hessian  matrix  is  obtained  by  differentiating 
the  i’th  element  of  the  gradient  with  respect  to  the  j’th  parameter;  i.e., 

=  Dj (g^) .  Substituting  Eq.  (5-4)  for  the  gradient  element,  we  obtain 


27 


AINSLEIGH  &  GEORGE 


Applying  the  product  rule  of  differentiation  then  yields 


-  2  yT  D.(  PFX  )  D.(F)  F+  y  -  2  yT  p/  D?.(F)  F+  y 


TnJ- 


-  2  y1  Pp  3.(F)  D. (F+)  y  . 


(5-5) 


Here,  Df ^ (F)  is  the  second  partial  derivative  of  the  basis  function  matrix 
with  respect  to  parameters  0 ^  and  9y  Substituting  Eqs.  (D-4)  and  (D— 11)  for 
the  derivatives  of  the  projector  and  the  pseudoinverse  and  noting  that 


Di(  V  )  -  -  VV  - 


we  obtain 


Hij  = 2  *T  (  yF)  F+  +  (p+)T  VfT)  Pp1 )  VF)  F+  i 

-  2  yT  Pp1  D^. (F)  F+  I  -  2  yT  Pp1  D.(F)  (  pP1  D. (FT)  (F+)T  F4 


-  F+  D.(F)  F+  +  F+  (F+)T  D.(FT)  Pp1  )  y 


=  2  yT  Pp1  D. (F)  F+  D.(F)  F+  y  +  2  yT  (F+)T  D. (FT)  Pp1  D.  (F)  F+  y 

-  2  yT  Pp1  D*. (F)  F+  y  -  2  yT  Pp1  D.(F)  pP1  D. (FT)  (F+)T  F+  y  (5-6) 

+  2  yT  Pp^  D.  (F)  F+  D.  (F)  F+  y  -  2  yT  D.(F)  F+  (F+)T  D.  (FT)  P^  y  . 

A  very  interesting  case  is  when  the  basis  function  matrix  has  full  column 
rank.  In  this  case,  since  there  are  M  independent  columns,  there  must  also  be 

M  independent  rows,  so  that  the  rows  of  F  span  the  entire  M-space  in  which 

they  lie.  The  null  space  of  F  therefore  contains  only  the  zero  vector.  In 
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this  case,  the  projector  onto  the  row  space  is  the  identity  matrix;  i.e., 
pP  =  Iy  The  projector  onto  the  orthogonal  complement  of  the  row  space  (the 
projector  onto  the  null  space)  is  therefore  the  zero  matrix;  i.e., 


=  0  . 


The. fourth  term  in  Eq.  (5-6),  which  contains  the  projector  onto  the  null  space 
of  F,.  therefore  vanishes  in  the  full  rank  case,  leaving  the  following 
expression: 


1  T 

Hi5  =  2  (  Pp  I  )  (  Dj(P)  F+  D.(F)  -  D?.(F)  +  D.(F)  F+  D.(F)  )  (  F+  y  ] 


(  (F+)T  D.(FT)  Pp  I 

)T  (  (F+)T  D.(FT)  p/i) 

f  1  +  •JT  f 

1  + 

K  v>  *  *■}  { 

pp  f  X  J  • 

(5-7) 

In  the  stepped  sine  response  signal  modeling  problem,  the  rank  of  the  basis 
function  matrix  is  determined  prior  to  the  pole  optimization.  The  basis 
function  matrix  can  therefore  be  assumed  full  rank  during  the  optimization 
process,  and  Eq.  (5-7)  is  the  applicable  equation  for  the  Hessian.  Chapter  7 
contains  an  implementation  of  Eq.  (5-7)  based  on  the  factorization  of  the 
basis  function  matrix. 


5.2  Derivatives  of  the  Prediction  Filter  Variable  Projection  Functional 

We  now  present  the  derivatives  of  the  prediction  filter  VPF.  While  these 
developments  parallel  those  given  in  the  previous  section,  they  have  not  been 
encountered  in  the  literature  in  their  present  form.  Here,  again,  we  first 
derive  the  Jacobian  matrix  by  noting  the  partial  derivative  of  the  row  space 
projector  given  in  Appendix  D.  We  then  derive  the  gradient  vector,  and  by 
differentiating  the  gradient,  we  derive  the  Hessian  matrix. 
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JACOBIAN  MATRIX 

The  definition  of  the  prediction  filter  VPF  is 

#oo-  IL^lf. 

so  that  the  error  vector  is 

e(b)  =  fiP  I  .  (5-8) 

The  typical  column  of  the  Jacobian  matrix  is  therefore  JL  =  D^)  (e)  = 
Substituting  Eq.  (D— 5)  for  the  partial  derivative  of  the  projector,  we  get  the 
desired  expression 


J.  =  {  B+  D.  (B)  fiP  +  (  B+  D.  (B)  0P  )  }  2 


=  B+  D.(B)  gP"  i  ♦  gP"  D.(BT)  (B+)T  i  . 


(5-9) 


GRADIENT  VECTOR 

To  derive  an  expression  for  the  gradient,  we  again  note  Eq.  (0-8);  i.e, 

T  T  . 

g^  =  2  e  D^(e)  =  2  e  J^.  Substituting  Eqs.  (5-8)  and  (5-9)  from  above,  this 
becomes 


T  JL  IT 

*i  -  2  (  BP  I  )  {  B+  Di<»)  BP  +  (  B+  Bi<B)  BP  ]  }  I 


=  2  iT  BP  B+  D.  (B)  gP1  I  +2  IT  BP  gP1  D.  (BT)  (B+)T  I 


The  two  projectors  in  the  second  term  annihilate  each  other;  i.e., 
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Also,  the  columns  of  the  pseudoinverse  of  B  span  the  same  space  as  that 
spanned  by  the  rows  of  B,  so  that  the  pseudoinverse  is  unaffected  by  the 
projector  onto  the  row  space.  This  becomes  evident  by  writing 


HESSIAN  MATRIX 

Proceeding  as  before,  we  obtain  an  expression  for  the  typical  element  of 
the  Hessian  matrix  by  differentiating  the  i’th  element  of  the  gradient  vector 
=  Dj (g^) •  Substituting  Eq.  (5-10)  for  the  gradient,  we  obtain 

H..  =  D.(  2  iT  B+  D.(B)  gP1  z) 

=  2  zT  D.(  B+  D.(B)  BP  j1!  . 

Applying  the  product  rule  of  differentiation  then  yields 


H..  =  2  zT  D.(B+)  D.(B)  BP  Z  +  2  zT  B+  D? .  (B)  fiP  £ 


2  zT  B+  D.(B)  D.(  gP1  )  z  •  (5-11) 


Here,  (B)  is  the  second  partial  derivative  of  the  prediction  filter 
convolution  matrix  with  respect  to  parameters  b^  and  b^ .  Since  the  elements 
of  the  convolution  matrix  are  the  prediction  coefficients  themselves,  the 
first  derivatives  will  have  ones  as  the  only  nonzero  elements.  The  second 
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derivative  of  the  convolution  matrix,  and  thus  the  second  term  in  Eq.  (5-11), 
therefore  vanishes.  Substituting  Eqs.  (D— 5)  and  (D— 11)  for  the  derivatives  of 
the  projector  and  the  pseudoinverse,  and  noting  that 

Dj  [  BP  )  =  "  Dj  V)  » 

we  obtain 

Hij  =  2  ^  (  Bpi  Dj  (bT)  (B+)T  B+  ~  B+  Dj  (B)  B+ 

+  B+  (B+)TD.(BT)  p/)  D.(B)BPii 

-  2  iT  B+  D.(B)  (  B+  D.(B)  gP1  +  gP1  D . (BT)  (B+)T  )  x 

=  •  Z  bP1  D.(BT)  (B+)T  B+  D.(B)  gP1  v  -  2  zT  B+  D.(B)  B+  D.(B)  gP1  x 
+  2  iT  B+  (B+)T  D.  (BT)  Pg1  D.(B)  gP1  i 

-  2  iT  B+  D.(B)  B+  D.(B)  gP1  X  -  2  XT  B+  D.(B)  gP1  D . (BT)  (B+)T  x  . 

(5-12) 

Due  to  the  structure  of  the  convolution  matrix,  its  rows  are  necessarily 
independent;  i.e.,  B  has  full  row  rank.  Since  there  are  (N-M)  independent 
rows,  there  must  be  (N-M)  independent  columns,  so  that  the  columns  of  B  span 
the  entire  (N-M) -space  in  which  they  lie.  The  projector  onto  the  column  space 
is  therefore  the  identity  matrix;  i.e.,  Pg  -  I.  The  projector  onto  the 
orthogonal  complement  of  the  column  space  is  therefore  the  zero  matrix;  i.e., 
Pg^*  =  0.  The  third  term  in  Eq.  (5-12),  which  contains  the  projector  onto  the 
orthogonal  complement  of  the  column  space  of  B,  therefore  vanishes  in  this 
case,  leaving  the  following  expression. 
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H. .  =  -  2  (  (B*)T  I  J  (  B.  (B)  B+  B.  (B)  +  B.  (B)  B+  B.  (B)  )  (  flP  I 

l  ^  l 

+  2  (  B+  D.(B)  fiP  x)  (  B+  D.(B)  flP  l  ) 

-  2  (  gP1  D.(BT)  (B+)T  i  }T  (  gP1  D.(BT)  (B+)T  i  )  .  (5-13) 

Chapter  7  contains  an  implementation  of  Eq.  (5-13)  based  on  what  we  call  an  EV 
factorization  of  the  convolution  matrix.  This  factorization,  as  well  as  the 
factorization  mentioned  in  Section  5.3,  will  be  the  subject  of  the  next 
chapter . 
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6.0  ORTHOGONAL  FACTORIZATIONS  FOR  NONLINEAR  OPTIMIZATION 

In  this  chapter,  we  consider  two  versions  of  the  QR  factorization,  which 

will  be  utilized  in  effecting  the  pseudoinverse  and  projection  operators 

necessary  for  the  optimization  strategies.  The  first  version  is  one  in  which 

we  transform  the  basis  function  matrix  from  the  left  such  that  Q  F  =  R  , 

where  Q  is  orthogonal  and  R  is  upper  triangular.  We  will  refer  to  this 

factorization  as  the  QR  factorization,  even  though,  strictly  speaking,  it  is 
T 

the  Q  R  factorization  of  matrix  F. 

In  the  second  case,  we  transform  the  convolution  matrix  from  the  right 
such  that  B  Y  =  R,  where  Y  is  orthogonal  and  R  is  now  lower  triangular.  We 
will  refer  to  this  as  the  RV  factorization.  In  achieving  this  factorization, 
we  will  take  advantage  of  the  special  banded  structure  of  the  convolution 
matrix. 


6.1  QR  Factorisation  of  the  Basis  Function  Matrix 

We  facilitate  the  formation  of  the  YPF  and  its  derivatives  by  factoring 
the  full  rank  NXM  basis  function  matrix  F  as 


Q  F  =  R  = 


where  Q  is  an  orthogonal  NXN  matrix  and 


(e-r 


is  square,  upper  triangular,  and  nonsingular.  We  may  then  write  F  as 
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The  pseudoinverse  of  F  can  now  be  defined  as 

F+  =  R+  Q,  (6-3) 

where  R+  =  R^  *  |  0  j  .  (6-4) 

As  is  shown  in  Appendix  A,  the  projector  onto  the  column  space  of  the  basis 
function  matrix  can  be  formed  from  this  factorization  as 


(6-5) 


where  Iy  is  the  MXM  identity  matrix.  The  projector  onto  the  orthogonal 
complement  of  the  column  space  of  F  is  then 


(6-6) 


As  described  in  Appendix  F,  this  factorization  is  achieved  by  applying 
successive  Householder  reflectors  from  the  left  of  F  so  that  Q  =  H^  •••  Hg  H^, 
where  each  represents  an  elementary  matrix  containing  a  Householder 
reflector.  Here  we  do  not  explicitly  form  the  matrix  Q  or  any  of  the  H^. 
Instead,  we  retain  the  information  necessary  to  reconstruct  the  Householder 
reflectors  when  we  need  to  apply  the  transformations. 


6.2  RY  Factorisation  of  the  Convolution  Matrix 

Here  we  describe  an  efficient  method  for  factoring  the  (N-M)XN  prediction 
filter  convolution  matrix  B.  Since  B  is  already  upper  trapezoidal,  we  can 
apply  Householder  reflectors  from  the  right  to  transform  B  into  a  lower 
triangular  matrix.  This  is  essentially  the  same  factorization  used  in  the 
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complete  orthogonal  factorization  for  rank  deficient  matrices.  The  EV 
factorization  results  in  the  following: 


B  V  =  R  = 


(6-7) 


where 


(N-M)X(N-M) 


is  square,  lower  triangular;  and  nonsingular.  With  this,  we  may  write 


B  =  R  VT  , 


(6-8) 


and  define  the  pseudoinverse  of  B  as 


B 


+ 


Y  R 


(6-9) 


where 


(6-10) 


Given  the  RY  transformation  described  above,  the  projection  operators  onto 
the  row  space  of  B  and  onto  the  orthogonal  complement  of  the  row  space  are 
defined  as 


and 


BP=  B+B 


Br 


(6-11) 


(6-12) 
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Representing  each  Householder  reflector  by  the  elementary  transformation 
matrix  H^,  the  orthogonal  matrix  V  can  be  written  as  the  product  of  the 
individual  elementary  transformations;  i.e.,  as  V  = 
an  important  representation  of  V  since,  when  applying  the  orthogonal 
transformation  represented  by  V,  we  in  fact  apply  the  sequence  of  elementary 
transformations  represented  by  the  H^.  Furthermore,  we  do  not  form  any  of  the 
explicitly  but  retain  the  minimal  amount  of  information  necessary  to 
reconstruct  these  transformations  as  needed. 

The  order  of  the  in  the  above  expression  is  important  because  while  we 
form  V  as  a  sequence  of  transformations  operating  from  the  right  of  a  matrix 
(as  a  postmultiplication,  operating  on  a  row  vector),  we  generally  apply  the 
transformations  to  other  vectors  from  the  left  (as  a  premultiplication, 
operating  on  a  column  vector) .  As  is  shown  explicitly  by  writing  Y  £  = 

H^  Hg  ••*  %  changing  from  postmultiplication  to  premultiplication 

requires  that  the  Householder  transformations  be  applied  in  the  reverse  order. 

The  banded  nature  of  the  convolution  matrix  allows  for  computational 
savings  during  the  orthogonal  factorization  process.  Since  each  row  starts 
out  with  exactly  M  nonzero  elements  to  the  right  of  the  diagonal,  and  since 
these  rows  retain  the  zeros  throughout  the  transformation  process,  each 
Householder  reflector  is  constructed  from  an  (M+l) -vector  as  opposed  to  an 
N-vector.  Since  N  is  usually  much  greater  than  M,  this  can  amount  to 
considerable  savings. 

Furthermore,  each  Householder  reflector  effects  only  the  M  rows 
immediately  following  the  row  from  which  it  was  constructed.  For  example, 
consider  the  i’th  reflector.  If  we  were  to  continue  applying  the  reflector 
beyond  the  (i+M-1) *th  row,  its  effects  would  be  squandered  on  the  zero 
elements  contained  below  the  diagonal;  i.e.,  upon  reaching  these  later  rows, 
it  would  be  transforming  an  (M+l) -dimensional  vector  of  zeros.  The  net  effect 
is  to  reduce  the  number  of  rows  that  must  be  transformed  by  each  reflector 
from  (N-M-i)  to  M. 


H2  %_M.  This  is 
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7.0  CONSTRAINED  MAXIMUM  LIKELIHOOD  ALGORITHMS 

In  this  chapter,  we  describe  how  to  implement  the  kernel  elements  of  the 
constrained  maximum  likelihood  algorithms,  which  utilize  the  methods  of  Gauss, 
Newton,  and  steepest  descent.  In  particular,  we  show  how  to  construct  the 
Jacobian  matrix,  gradient  vector,  and  Hessian  matrix  for  each  of  the 
constrained  VPF’s. 

First  we  present  these  derivatives  for  the  constrained  signal  basis  VPF, 
given  in  Eq.  (4-18)  as  « 


1  2 

-  ||  F„  (0  X  ||  • 

Prior  to  presenting  these  derivatives,  however,  we  will  discuss  an  economical 
way  of  introducing  the  pole  constraints;  i.e.,  of  deflating  the  least-squares 
error  space.  This  method  will  take  advantage  of  the  structure  of  the 
projection  operator  when  written  in  terms  of  the  orthogonal  matrix  from  the  QR 
factorization. 

We  then  will  implement  the  derivatives  of  the  constrained  prediction 
filter  VPF, 


Here,  the  vector  u  contains  the  prediction  coefficients  corresponding  to  the 
unknown  poles.  These  coefficients  axe  the  elements  of  the  matrix  U,  which 
results  from  the  factorization  of  the  convolution  matrix  given  in  Eq.  (4-23) 
as  B  =  U  C. 


7.1  Optimization  of  Exponential  Poles 

We  now  develop  algorithms  for  minimizing  the  constrained  signal  basis  VPF. 
' f-cr  describing  the  implementation  of  the  pole  constraints,  we  describe  the 
formation  of  the  Jacobian  matrix  and  note  a  simplified  Gauss-Newton  algorithm 
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introduced  by  Kaufman  [8]  •  We  then  discuss  the  the  gradient  vector  and  the 
Hessian  matrix,  which  are  necessary  for  implementing  the  Newton  algorithm. 


IMPLEMENTATION  OF  TEE  POLE  CONSTRAINTS 

In  this  subsection,  we  describe  a  modified  version  of  the  constrained 
basis  function  matrix  G  =  Fo(0) ,  and  the  constrained  observation  vector 

I  |  1  £ 

y  =  Pj  y.  Here  ,  P^  is  the  projector  onto  the  orthogonal  complement  of  the 
column  space  of  the  matrix  (which  contains  the  basis  functions  correspond¬ 
ing  to  the  known  poles)  and  Fg  is  the  matrix  of  estimated  basis  functions. 

Now  recall  the  constrained  least-squares  functional,  given  in  Eq.  (4-6)  as 

1  2 

^2  =  1 1  P1  [  1  “  -2  )  1 1  ' 

If  we  perform  the  factorization 


Fx  =  R  ,  (7-1) 

where  is  an  orthogonal  matrix  and  R  is  an  NXM^  upper  triangular  matrix, 
then  we  can  form  the  projector 


where 


rj 

I  W,, 


I  - 


0  I  °  1 


)M1 

)  N-M* 


The  constrained  least-squares  functional  then  becomes 


-  il  *lT  i  »!  (  x  -  Fjtt)  *2  )  ||  • 

If  we  now  note  the  isometric  properties  of  this  becomes 

2 

*2(&*b)  =  1 1  1  W1  (z-*2(*)%)  II  • 


(7-2) 
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And,  finally,  if  we  partition  as 

»  _  f  hi  1)  Mi 

1  ^  *12  "  N-*l' 

then  our  least-squares  functional  can  be  written  as 


“  1 1  *12  (  *  -  F2<®  %  )  II 


i  I  J2 

=1  V  -  G(0)  »2  I |  • 

Substituting  the  linear  least-squares  estimator  for  ag,  we  get 


It  J-  11^ 

>>2®  =  II  PG  *  II  ■ 

where 

0  -  *12  P2 

(7-3) 

and 

K 

(I 

•  J- 

to 

K 

(7-4) 

To  effect  these,  we  apply  the  orthogonal  matrix  to  the  columns  of  Fg  and  to 
y,  then  discard  the  first  rows  of  anc*  elements  of  W^y, 

leaving  the  (N-M^)XMg  matrix  G  and  the  (N-M^) -vector  v. 
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PSEUDOINVERSE  AND  PROJECTOR  DEFINITIONS 

Throughout  the  remainder  of  this  section,  we  use  results  from  the  QR 
factorization  of  the  constrained  basis  funtion  matrix,  given  by 


q  g  =  r  , 


(7-5) 


In  particular,  we  will  use  the  following  definitions  for  the  pseudoinverse  and 
projection  operators: 


G+=  R+  q  , 


°] . 


(7-6) 


PG  =  Ix  q 


T  f  1  I  °  P  M2 

1  1  0  I  0  JD  N-M 


(7-7) 


1  x 

pg  =  Qt  i2  Q  > 


(7-8) 


JACOBIAN  MATRIX  CALCULATION 

The  Jacobian  matrix  is  defined  as  the  derivative  of  the  error  vector  with 
respect  to  the  parameter  vector.  If  we  take  the  partial  derivative  with 
respect  to  a  single  parameter,  then  we  get  a  column  of  the  Jacobian  matrix. 

An  expression  for  this,  given  in  Eq.  (5-2),  is 

k  =  ~  *Q  Di(G)  G+  *  "  (G+)T  Di((jT)  *G  2  • 
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Substituting  the  definitions  for  the  pseudoinverse  and  projector,  we  get 

k  =  -  h  q  Di(G)  R+  Q  ^  ^  (r+)T  V6*)  QT  h  *  *  >  (7-Q) 

and/or  J.  =  -  QT  f,I  gQ  D  |G)  R+  1}  v  +  (R+)T  D  £GT)  QT  I  gQ  v  )}  .  (7-10) 


If  we  now  define  the  N-vectors 


£u  =  *  Di(G)  R+  *  *  (7-H) 

and  z.2  =  (R+)T  D.(GT)  QT  Ig  Q  v  ,  (7-12) 

then  the  typical  column  of  the  Jacobian  matrix  becomes 

k  -  '  »T  (  k  *  %2  )  '  <7-13> 

At  this  point,  it  is  convenient  to  define  the  (N-M^) -vector 

w  =  Q  y  (7-14) 

so  that  Eq.  (7-4)  then  becomes 

Sil  =  Q  {  \(G)  {  R+  w)  }  .  (7-15) 


Here  we  have  included  braces  to  denote  the  order  in  which  we  perform  the 
operations.  First,  we  perform  back  substitution  of  the  MgXMg  matrix  R^  with 
the  first  Mg  elements  of  w  to  yield  the  Mg-vector  x  =  R  w  .  Since  D^(G)  is 
an  (N-M^)XMg  partial  derivative  matrix  with  only  one  or  two  nonzero  columns, 
the  next  step  will  involve  some  indexing  to  selectively  multiply  the  nonzero 
elements  of  each  row  by  the  coresponding  elements  of  x  to  yield  the  (N-M^)- 
Yector  D^(G)  R  w  .  Next,  we  apply  the  orthogon^.]  matrix  Q  to  yield  z.^. 
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In  evaluating  z^g,  we  substitute  Eq.  (7-15)  into  Eq.  (7-13)  to  yield 

%2  ’  (*+>T{  MgT>  {  «T  h  *  }  }  ' 

Here,  we’ve  again  used  braces  to  indicate  the  order  of  operations.  We  first 
replace  the  first  Mg  elements  of  vector  w  with  zeros  and  apply  to  w  the 
transpose  of  the  orthogonal  matrix  Q.  We  then  form  the  inner  product  between 
the  nonzero  columns  of  D.(G);  i.e.,  those  which  correspond  to  the  i’th 
parameter  and  the  N-vector  Q  ^  w.  These  inner  products  are  the  nonzero 

T  T 

elements  of  the  M-Yector  D. (G  )  Q  I0  w  .  We  then  perform  forward 

**■  "  T 

substitution  of  the  lower  triangular  matrix  R. x  with  the  first  M  elements  of 
T  T  .  1 

the  vector  D^(G  )  Q  Ig  w  to  yield  the  Mg  nonzero  elements  of  z^g. 

To  complete  the  i’th  column  of  the  Jacobian  matrix,  we  would  define  an 
(N-M^) -vector  z^,  whose  first  Mg  elements  are  the  elements  of  z^g  and  whose 
last  N-M  elements  are  the  last  N-M  elements  of  z.^,  and  then  apply  to  this 
vector  the  transpose  of  the  orthogonal  matrix  Q.  Since,  however,  we  normally 
use  the  Jacobian  matrix  for  approximating  the  Hessian  matrix  as 


H  !  2  J  J  , 


(7-16) 


this  leading  matrix  Q  will  cancel  itself  in  the  product,  so  that  the  final 

T 

step  of  applying  matrix  Q  can  be  skipped.  Using  this  approximation  to  the 

Hessian,  the  approximate  Newton  direction  (the  Gauss-Newton  direction)  can  be 

obtained  (see  Appendix  C)  by  solving  the  linear  least-squares  problem 

T 

J  d  =  -e.  Here,  we  note  that  the  leading  matrix  Q  appears  in  the  definitions 

T 

of  both  the  Jacobian  matrix  and  the  error  vector  e  =  Q  Q  i >  so  that  it 
should  be  neglected  when  the  Gauss-Newton  direction  is  the  desired  end  result. 


A  much  simpler  Gauss-Newton  algorithm,  however,  has  been  devised  by 


Kaufman  [8]  that  allows  us  to  avoid  computing  f  which  is  the 
computationally  more  costly  of  the  two  z^  terms.  In  a  manner  similar  to  the 
introduction  of  the  pole  constraints,  Kaufman’s  algorithm  takes  advantage  of 
the  structure  of  the  projection  operator  when  constructed  from  the  QR 
factorization  of  the  basis  function  matrix.  This  is  the  topic  of  the  next 
subsection. 
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MODIFIED  GAUSS-NEWTON  ALGORITHM 

Noting  that  the  matrix  Q  is  isometric  (i.e.,  length-preserving),  we  know 

that 


1  n2 

pg  i 


1  1.2 


=  |  Q  QT  l2  Q  I 


*2^ 


(7-17) 


We  can  therefore  define  the  modified  variable  projection  functional 


=  , 


(7-18) 


where  we  have  partitioned  Q  as 


«l  = 


A 

«2 


} 


N-1L 


(7-19) 


While  the  partial  derivative  of  Qg  is  dependent  upon  the  orthogonalization 
process  in  which  the  matrix  Q  is  determined  (and  therefore  is  not  unique) , 
Kaufman  [8]  derives  the  following  general  formula  whose  results  (though 
nonunique)  are  similar  within  an  orthogonal  transformation: 

VOg)  =  -  Qg  d.  (G)  r+  q  +  r  q,  (7-20) 

where  +  T  =  0.  (7-21) 

Since  the  matrix  T  is  not  unique,  neither  is  D.^^).  We  can,  however,  choose 
F  -  0,  which  certainly  satisfies  Eq.  (7-21),  leaving 
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\  =  yiy  v  =  -  ^  D.(G)  R+  (j  v  ,  (7-22) 

which  is  just  the  last  N-M  elements  of  the  N-vector  z.^  from  the  previous 
subsection. 


For  this  modified  signal  basis  YPF,  the  error  vector  is 


Defining  the  matrix  J,  whose  columns  are  the  the  equation  for  the 
modified  Gauss-Newton  direction  becomes 

d  -  3+  «2  i  • 


GRADIENT  VECTOR  CALCULATION 


The  typical  element  of  the  gradient  vector  is  given  in  Eq.  (5-4)  as 

gi  =  -  2  vT  Di(G)  G+  v  . 

Substituting  for  the  pseudoinverse  and  projector,  we  get 

g.  -  -  2  vT  QT  12  q  D.  (G)  R+  v 

=  -  2  wT  Ig  zu  ,  (7-23) 


where  w  and  z. ^  are  formed  as  described  in  the  previous  subsection. 

HESSIAN  MATRIX  CALCULATION 


We  begin  by  recalling  the  expression  from  Chapter  5  for  the  typical  element  of 
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the  Hessian  matrix,  which  is  given  by 

_L  T 

Hij  =  2  (  PG  I  )  [  (0)  G+  D.(G)  -  D?j  (0)  *  D.(G)  G*  D.  (G)  )  (  G+  v  ) 

-  2  (  (GY  D.  (G1')  Pg  \  ]  [  (G+)T  D.(GT)  v  ) 

+  2  (PG  D.(G)  G+  v  j  (pG  D.(G)V  v)  . 

If  we  substitute  the  definitions  for  the  pseudoinverse  and  projector  in  terms 
of  the  orthogonal  factorization,  we  get 

T 

Hij  =  2  (  »T  *  )  (  Dj  <G)  R+  «  B.  (G)  -  D?.  (G)  +  D.  (G)  R+  I)  D.  (G)  )  E*(, 

T  T 

-  2  {  (  R+  Q  ]  D.(GT)  QT  I2  Q  V  }  (  R+  Q  ]  D. (GT)  QT  Ig  Q  V 

T 

*  2  (  1T  I2  «  b.(G)  r*  q  v  )  qT  ^  q  D.(G)  R+  q  v 

=  2  »T  Ij  (  Q  D.  (G)  q  D.  (G)  -  q  D?.  (G)  ♦  q  D.  (G) 

-  2  (  C*+)T  (Gt)  qT  l2  W  )T  q  qT  (  (r*)t  D.  (gt) 

T 

♦ 2  (  h  q  »j(G)  »+ » )  « qT  (  \  q  BJG)  r+  » ) 

where  w  is  defined  the  same  way  as  during  the  Jacobian  calculation.  Note  that 
the  product  QQ  will  become  identity  in  the  last  two  terms.  Let  us  also 
recall  our  previous  definitions  of  x,  z.  ,  and  z._;  i.e. 


R+  Q  Dj  (G)  )  E+  w 

h  *) 

,  (7-24) 
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x  =  R+  w, 


=  4  D.(G)  R+  w  , 

and  zi2  =  (R+)T  D.(GT)  QT  ^  w  . 

We  now  can  write  the  typical  element  of  the  Hessian  as 


H 


I.  .  =  2  (  zT0  z. 
13  l  -j2  -1 


j2  -il  +  -i2  -jl 


T 

-i2  -j2 


-  2  wT  ^  Q  D?.(G)  x  .  (7-25) 

Here  we  will  need  to  perform  some  special  indexing  to  achieve  the 
multiplication  by  the  second  partial  derivative  matrices.  Otherwise,  this 
last  term  is  calculated  straightforwardly.  Having  obtained  the  Hessian 
matrix,  the  Newton  direction  is  the  solution  to  the  equation  H  d  =  -  g,  which 
is  a  square  symmetric  system  that  can  be  solved  efficiently  using  the  LU 
decomposition. 


In  the  neighborhood  of  the  minimum  (where  we  are  most  likely  to  use 
Newton’s  method),  the  functional  is  approximately  quadratic  so  that  a  line 
search  to  determine  step  size  is  not  necessary;  i.e.,  we  set  the  step  size 
equal  to  one. 


7.2  Optimisation  of  Prediction  Filter  Coefficients 

We  will  now  discuss  the  algorithms  for  minimizing  the  prediction  filter 
VPF.  This  path  yields  much  more  efficient  algorithms  because  the  convolution 
matrix  is  a  linear  function  of  the  coefficients.  The  formation  of  the 
convolution  matrix  and  its  derivatives  is  therefore  purely  a  matter  of 
indexing.  Furthermore,  the  second  partial  derivatives  of  the  convolution 
matrix  vanish,  thus  simplifying  the  Hessian  calculation.  There  does  not, 
however,  appear  to  be  a  Gauss-Newton  algorithm  which  is  equivalent  to  the 
Kaufman  algorithm. 
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Because  forming  the  Hessian  matrix  is  only  slightly  more  costly  than 
forming  the  Jacobian  matrix  for  this  error  norm,  and  because  Newton’s  method 
generally  does  not  need  a  line  search  to  determine  the  step  size,  Newton’s 
method  is  computationally  more  efficient  than  the  Gauss-Newton  method  when  it 
can  be  used;  i.e.,  when  the  Hessian  is  positive  definite. 


PSEUDOINVERSE  AND  PROJECTOR  DEFINITIONS 

Throughout  the  remainder  of  this  section,  we  will  use  results  from  the  RV 
factorization  of  the  convolution  matrix,  given  by 

B  V  =  R,  R  =  [  R1  |  0  ]  . 

In  particular,  we  will  use  the  following  definitions  for  the  pseudoinverse  and 
projection  operators: 


B  =  V  R 


R+= 


7 


P  =  V  I 
Br  V  A1  V  » 

i  X 

BP  =Vl2YT, 


I 

.  0 
0 

.  0 


-P 

0  J) 

-P 

iJ) 


N-M 

M 


N-M 

M 


JACOBIAN  MATRIX  CALCULATION 

We  begin  by  recalling  the  expression  from  Chapter  5  for  typical  column  of 
the  Jacobian  matrix,  given  by 


J. 

“i 


i 


=  B+  D.(B)  BP  y  +  BP 


1 


D.(BT) 


(p*+)Tz 
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If  we  recall  that  the  partial  derivative  of  the  convolution  matrix  can  be 
factored  as  D^(B)  =  D^(U)  C,  then  we  get 

J.  =  Y  E+  Di(U)  C  Y  Ig  VT  i  +  V  Ig  YT  CT  D-^U7)  (R+)T  VT  i  .  (7-26) 

Let  us  now  define 


zn  =  R+  D.  (U)  C  Y  Ig  YT  I  , 

(7-27) 

and 

z.g  =  YT  CT  D.  (UT)  (R+)T  VT  x  . 

(7-28) 

The  typical  column  of  the  Jacobian  can  then  be  written  as 

4  ■  T  { «u  +  h  «u )  •  (7'29) 

As  before,  we  note  that  the  primary  purpose  for  forming  the  Jacobian  matrix  is 
to  calculate  the  Gauss-Newton  direction  d  by  solving  the  linear  least-squares 
problem  J  d  -  -  e.  We  also  note  that  the  orthogonal  matrix  Y  appears  as  the 
leading  term  on  both  sides  of  the  equation  (i.e.,  in  the  definition  of  both  J 
and  e) ,  so  it  can  be  ignored. 

In  discussing  the  formation  of  the  z^,  an  example  case  will  be  helpful  to 
illustrate  the  manipulations  of  the  convolution  matrices.  We  will  consider 
the  case  with  N  =  12,  M^  =  3,  and  Mg  =  2  (thus  M  =  5) .  Thus,  for  our  example 
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b5  b4  b3  b2  bl  1  0  0  0  0  0  0 

0  b5  b4  b3  b2  bl  1  0  0  0  0  0 

0  0  b5  b4  b3  b2  bi  1  0  0  0  0 

0  0  0  b5  b4  b3  b2  bi  1  0  0  0 

0  0  0  0  bc  b,  b„  b„  b,  1  0  0 

o  4  3  2  1 

0  0  0  0  0  bc  b,  b,  b„  b  1  0 

5  4  3  2  1 

0  0  0  0  0  0  b5  b4  b3  b2  b!  1 


u2  Uj  1  000000 

0u2u1  1  00000 

00  u2u1  1  0000 

000  u2ux  1000 
0000  u2ux  100 
o  0  0  0  0  u2  ux  1  0 

0  0  0  0  0  0  u2  u  1 


c3c2ci  1  00000000 

0  Cg  c2  10000000 

00  c3  c2  Cj  1000000 
000  cg  c2  c.  100000 

0000  Cj  c„  c,  10000 

00000  CgCgC^  1000 

0  0  0  0  0  0  c„  c~  c.  1  00 

O  Z  l 

0000000  c„c0c1  10 
o  o  0  0  0  0  0  0  c3  c2  Cl  i 
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Before  evaluating  the  z^,  it  is  convenient  to  define 

2  =  VT  I  ,  (7-30) 

so  that  the  expression  for  z ^  becomes 

%1  -  {  “i™  {  0  {  V  h  2  }  }  }  • 

As  before,  the  braces  indicate  the  order  of  operations  in  forming  the  vector. 

We  begin  by  setting  to  zero  the  first  N-M  elements  of  the  vector  w  and  by 
applying  the  orthogonal  transformation  represented  by  matrix  V.  We  then 
effect  the  premultiplication  by  matrix  C  as  a  vector  convolution,  without  edge 
effects,  between  Y  Ig  w  and  the  vector 

[  1  cl  c2  ***  ^  ‘ 

If  we  now  examine  the  structure  of  the  partial  derivative  of  matrix  U  with 
respect  to  one  of  the  coefficients,  we  see  that  these  derivative  matrices  are 
merely  (N-M) X (N-M)  identity  matrices  with  zero  columns  added  to  make  an 
(N-M)  X  (N-M*)  matrix.  Shown  below  is  the  partial  derivative  of  matrix  U  with 
respect  to  Ug  for  cur  example. 

100000000 
010000000 
001000000 
000100000 
000010000 
000001000 
C00000100 

If  we  premultiply  an  arbitrary  (N-M^) -vector  x  with  this  partial  derivative, 
we  see  that  the  result  is  simply  to  select  an  (N-M) -element  segment  of  the 
vector  x.  For  our  above  example  and  the  vector 

r  1  T 

x  =  [  xx  x2  •••  xQ  j  , 
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the  resulting  product  is 


The  partial  derivative  of  our  example  matrix  U  with  repect  to  u^  would  contain 
zeros  in  the  first  and  last  column,  and  an  identity  matrix  in  columns  two 
through  nine.  This  would  appear  as 


Extending  this  to  the  general  case,  we  would  obtain 


d„2TO  *  =  [  x,  x2  •  •  •  XN-M  ]  ’ 

V®  -  =  [  X2  X3  •  •  •  XN-M+1  ]  ' 

D1(U)  -  =  [  xM^r  .  xN  M^_1  ]  . 

The  matrix  product  can  therefore  be  effected  simply  by  selecting  the 
appropriate  elements  from  the  (N-M^) -vector  C  V  Ig  w  to  form  the  (N-M) -vector 
D. (U)  C  V  I0  w.  Having  obtained  this  result,  we  now  effect  the 
premultiplication  of  the  matrix  E  by  performing  back  substitution  of  the 
(N-M) X (N-M)  matrix  with  the  first  (N-M)  elements  of  the  product 
D^(U)  G  V  Ig  w  to  yield  the  nonzero  elements  of  z.^. 

The  expression  for  z is 

%2  -  yT  {  «T  {  {  (R+)T  2  }  }  }• 
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We  begin  forming  z ^  by  performing  forward  elimination  of  the  (N-M)  X  (N-M) 
lower  triangular  matrix  with  the  first  (N-M)  elements  of  w  to  yield 

x  =  (R  )  w.  We  then  again  note  the  special  structure  of  the  partial 
derivative  matrices  and  recognize  that  the  transpose  of  each  of  these  matrices 
is  simply  an  (N-M) X (N-M)  identity  matrix  with  zero  rows  added  to  make  an 
(N-M^)X(N-M)  matrix.  Thus  when  we  premultiply  the  above  (N-M) -vector  x  with 
one  of  the  derivative  matrices,  the  effect  is  simply  to  pad  the  vector  with 
zero  elements  to  make  an  (N-M^) -vector;  i.e.,  the  vector 


*  ■  [  xi 


*2  ’  '  •  x7 


vhen  premultiplied  by  DgCU  )  ^rom  our  example  becomes 


T 

D2(UT)  x  =  [  x1  x2  .  .  .  Xg  x7  0  0  ]  . 

For  the  derivative  with  respect  to  u^,  this  product  becomes 

T 

DjO!1)  x  =  [  0  x2  .  .  .  x„  0  ]  . 

For  the  general  case,  with  an  (N-M) -vector  x,  the  resulting  products  are  the 
following  (N-M^) -vectors : 


where  the  total  number  of  zeros  in  each  vector  is  equal  to 
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The  next  step,  after  haying  effected  the  multiplication  of  the  partial 

derivatives,  is  to  premultiply  by  the  transpose  of  the  convolution  matrix  C. 

T 

This  is  equivalent  to  convolving,  with  edge  effects,  D^(U  )  x  with  the  vector 

f  1T 

.  ^  ^+1  ‘  ’  *  C1  1  J 

We  then  apply  the  transpose  of  matrix  V  to  yield  the  result 

GRADIENT  VECTOR  CALCULATION 

Recall  from  Chapter  5  that  the  typical  element  of  the  gradient  vector  is 

T  J- 

g.  =2  Z  B  D.(B)  fiP  i  . 

Substituting  the  definitions  for  the  pseudoinverse  and  projection  operator  in 
terms  of  the  RV  factorization  and  recalling  that  D^(B)  =  D^(U)  C,  this  becomes 

g.=  2  £T  V  R+  D.(U)  CV^V1!.  (7-31) 

T 

If  we  now  substitute  w  for  Y  Zi  we  get 

g.=  2  wT  R+  D.  (U)  C  V  w0.  (7-32) 

1  X  « 

But  we  can  also  recognize  that  this  is  the  same  as 

gi=2w Tz.1,  (7-33) 

where  z^  is  formed  as  described  in  the  previous  subsection. 

HESSIAN  MATRIX  CALCULATION 

Recall  from  Chapter  5  that  the  typical  element  of  the  Hessian  matrix  for 
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the  prediction  filter  VPF  is  given  by 

H. .  =  2  yT  {  [  gP1  D.(BT)  (B+)T  +  B+  D . (B)  B+  ]  D.(B)  gP1 

-  B+  D.(B)  [  B+  D.(B)  BP  i  0P  D‘|(BT)  (B+)T  ]  }  i 

=  2  z  gP1  D. (BT)  (B+)T  B+  D.(B)  gP1!  -  2  /  B+  D . (B)  B+  D.(B)  gP1  x 

-  2  yT  B+  D.(B)  B+  D. (B)  gP  ^  -  2  yT  B+  D.(B)  fiP  D. (BT)  (Bf)T  x  . 

Substituting  the  definitions  for  the  pseudoinverse  and  projection  operator 
in  terms  of  the  BY  factorization,  and  recalling  that  D^(B)  =  D^(U)  C,  this 
becomes 

I..  =  2iTVLVTCTD.  (UT)  (R+)T  yt  Y  R+  D.  (U)  CVLVTi 
13  «  3  X  6 

-  2  xT  Y  R+  D.  (U)  C  Y  R+  D.  (U)  C  V  L  YT  2 

3  1  " 

-  2  xT  V  R+  D£ (U)  C  Y  R+  D.  (U)  C  V  Ig  VT  x 

-  2  iT  Y  R+  Di(U)  C  V  VT  CT  Dj (UT)  (R+)T  VT  x 
Hj.  =  2  /  Ig  VT  CT  D. (UT)  (R+)T  R+  Di(U)  C  Y  ^  w 

-  2  wT  R+  D.  (U)  C  Y  R+  D.  (U)  C  L  w 

3  1  ^ 

-  2  wT  R+  D.  (U)  CVR+D.(U)  CYL* 

1  3  « 

-  2  wT  R+  D.  (U)  C  V  L  VT  CT  D.  (UT)  (R+)T  w  .  (7-34) 

1  z  3 

We  can  further  simplify  this  as 
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H..  =  2  [  R+  D. (U)  C  V  I2  w  ]T  [  E  +D.(U)  C  V  Ig  w  ] 

-  2  [  VT  CT  D.(UT)  (R+)Ts  ]T  [  R+  D.(U)  C  V  ^  w  ] 

-  2  [  VT  CT  Di(UT)  (R+)Tw  ]T  [  R+  (U)  CVIgw] 

-  2  [  I2  VT  CT  D.(UT)  (R+)Tw  ]T  [  Ig  VT  CT  D.(UT)  (R+)T  2  ]»  (7-35) 


H. .  =  2 
13 


{  -  *j2  2: 


.  1  -  Z.n  Z.-  + 

11  “12  ~jl 


4l  %1  -  %2  h  2j2  ^  ’ 


(7-36) 


■where  the  z.,  z.  were  calculated  for  the  Jacobian  matrix.  As  before,  the 
-i  “3 

Newton  direction  is  calculated  from  the  equation  H  d  =  -  g,  and  the  step  is 
taken  with  step  size  equal  to  one. 
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APPENDIX  A  -  REVIEW  OF  LINEAR  LEAST-SQUARES  THEORY 

In  this  appendix,  we  review  some  of  the  elements  of  linear  least-squares 
theory  essential  to  nonlinear  least-squares  optimization.  The  purpose  here  is 
two-fold:  to  elucidate  in  some  detail  the  nature  of  the  projection  operator, 
the  generalized  inverse,  and  the  pseudoinverse;  and  to  describe  the 
application  of  the  QR  factorisation  (and  its  relatives)  to  solving  linear 
least-squares  problems. 

We  will  begin  by  examining  the  orthogonal  projection  operator.  Here, 
following  the  the  work  of  Halmos  [9]  and  Wilkinson  [10] ,  the  properties  of  the 
projection  operator  will  be  discussed.  The  eigenstructur  of  this  operator  is 
also  examined.  We  then  examine  the  properties  of  the  generalized  inverse  for 
finding  general  solutions  and  minimum  norm  solutions  to  consistent  equations 
and  linear  least-squares  problems.  This  will  culminate  in  a  discussion  of  the 
Moore-Penrose  generalized  inverse,  or  pseudoinverse,  which  leads  to  the 
minimum  norm  least-squares  solution. 

Following  this,  we  examine  several  scenarios  in  which  QR  factorization 
techniques  are  applied  to  the  linear  least-squares  problem.  While  only  the 
full  rank  case  is  of  interest  in  the  nonlinear  algorithms  presented  in  this 
paper,  we  examine  the  rank  deficient  and  near  rank  deficient  cases  for 
application  to  other  aspects  of  the  signal  modeling  problem,  namely  the  use  of 
the  complete  orthogonal  factorization  for  effecting  rank  reduction  of  the  data 
matrix  in  the  algorithm  for  obtaining  the  initial  estimates  (see  Ref.  [4]  for 
details) . 

In  the  full  rank  case,  the  QR  factorization  is  seen  to  lead  directly  to 
the  Moore-Penrose  generalized  inverse.  In  the  rank  deficient  case,  we  examine 
the  truncated  QR  factorization  and  the  complete  orthogonal  factorization  as 
o  tlined  by  Hanson  and  Lawson  [11] ,  Golub  and  Pereyra  [5] ,  and  Golub  and  Van 
Loan  [12] .  It  is  seen  that  while  the  g-inverse  formed  using  the  truncated  QR 
factorization  does  not  lead  to  the  pseudoinverse,  the  complete  orthogonal 
factorization  does  achieve  the  minimum  norm  solution  and  thus  provides  an 
alternative  to  the  singular  value  decomposition  for  forming  reduced  rank 
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approximations.  Finally,  the  application  of  the  complete  orthogonal 
factorization  to  nearly  rank  deficient  matrices  is  discussed. 


A.l  The  Orthogonal  Projection  Operator 


Every  projection  operator  is  defined  on  a  particular  linear  manifold 
(subspace)  in  a  finite  dimensional  vector  space.  One  of  the  key  relationships 
between  the  projector  and  the  corresponding  subspace  is  that  the  subspace  is 
invariant  under  the  transformation  represented  by  the  projection  operator; 
i.e.,  the  transformation  operating  on  any  vector  from  the  subspace  will  return 
another  vector  in  the  same  subspace.  Also,  the  projector  represents  the 
identity  transformation  over  this  invariant  subspace.  One  special  class  of 
projectors  is  called  the  orthogonal  projector,  which  must  satisfy  somewhat 
stricter  requirements  regarding  the  associated  subspace.  These  properties  we 
will  now  develop. 

N  N 

Consider  the  N-dimension  vector  space  R  and  a  subspace  S  in  R  .  There 

N 

exists  a  subspace  S,  called  the  orthogonal  complement  of  S,  such  that  R  is 
the  direct  sum  of  S  and  S.  Drawing  from  the  work  of  Halmos  [9] ,  the  following 
definitions  and  Eqs.  (A-l)  through  (A~7)  characterize  the  orthogonal 
projection  operators  associated  with  S. 


Definition  1 


N 

There  exists  an  operator  P  that  maps  every  vector  in  R  onto  the 

subspace  S  by  projecting  along  the  orthogonal  complement  S.  This 

operator  is  called  the  orthogonal  projection  operator  onto  the 
N 

subspace  S  in  R  . 


Definition  2 


JL  N 

There  exists  an  operator,  P  ,  which  maps  every  vector  in  R  onto 

the  complement  subspace  S  by  projecting  along  the  subspace  S. 

This  operator  is  called  the  orthogonal  projection  operator  onto 

N 

the  orthogonal  complement  of  S  in  R  . 


Furthermore,  these  projectors  satisfy  the  relationship 
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1 

P  =  I  -  P  . 


(A— 1) 


For  convenience,  we  will  subsequently  use  the  term  projector  to  mean 
orthogonal  projection  operator ;  i.e.,  we  will  consider  only  those  projectors 
for  which  a  projector  onto  the  orthogonal  complement  subspace  exists  and  is 
defined  by  Eq.  (A— 1) . 


»N 


Now  consider  an  arbitrary  vector  z  e  R  .  We  can  decompose  z  as 
z  =  z^  +  Zg,  where  z^  e  S  and  Zg  e  S.  The  projectors  defined  above  satisfy 
the  following  six  relationships: 


p  %  =  h 


P  =  0 


p  %  =  ° 


P  *2  =22 


P  z  =  z, 


P  2  =  Y 


(A-2) 
(A-3) 
(A-4) 
(A-5) 
(A-6) 
(A— 7) 


From  Eqs.  (A-6)  and  (A-2),  we  see  that 


P  z  =  P  (P  z)  =  P  z1  =  z-  =  P  2, 


so  that 


P2  =  P; 


(A-8) 


i.e.,  the  projector  is  idempotent.  We  see  from  Eqs.  (A-4)  and  (A-7)  that 


P  (P  z)  =  P  ag  =  0 


so  that 


1 

P  P  =  0  . 


(A- 9) 
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Similarly,  from  Eqs.  (A-3)  and  (A-6) ,  we  see  that 

1  1 

P  (P  z)  =  P  zx  =  0 

1 

so  that  P  P  =  0  .  (A- 10) 

Finally,  from  Eqs.  (A-6)  and  (A-7)  and  from  the  definition  of  orthogonality 

(P  z)T  (P  z)  =  z^  z2  =  0  . 

Substituting  Eq.  (A-l)  and  transposing  within  the  parentheses  yields 

zT  PT  (I  -  P)  z  =  0. 

N 

Since  this  is  true  for  all  z  in  R  ,  we  must  have 

PT  (I  -  P)  =  0 

m  m 

so  that  P1  =  Px  P. 

Since  the  right-hand  side  is  symmetric,  we  must  have 

P  =  PT.  (A-ll) 

Note  that  it  is  this  symmetry  separating  the  orthogonal  projection  operator 
from  the  general  projector  which  need  only  be  idempotent.  In  summary,  the 
projectors  P  and  P^  are  idempotent,  symmetric,  and  mutually  annihilating. 

Further  insight  can  be  gained  by  examining  the  eigenstructure  of  the 

projector.  Since  the  subspace  S  is  invariant  under  the  transformation  P,  and 

since  P  represents  the  identity  transformation  over  S,  S  is  spanned  by  a  set 

of  the  eigenvectors  of  P,  each  of  which  corresponds  to  a  unity  eigenvalue  [9] . 

N 

The  remaining  eigenvectors  span  the  complement  of  S  in  R  and  correspond  to 

N 

eigenvalues  of  zero.  The  set  of  all  eigenvectors  of  P  forms  a  basis  for  R  . 

To  show  that  the  eigenvalues  of  P  are  constrained  to  the  values  zero  and 
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one,  we  note  that  any  symmetric  matrix  A  can  be  transformed  [10]  such  that 


UT  A  U  =  D, 


(A-12) 


where  U  is  an  orthogonal  matrix  and  D  =  diag(X^,  . ..,  X^) 
containing  the  eigenvalues  of  A.  The  i’th  column  of  U  is 
corresponding  to  the  i’th  diagonal  element  of  D,  the  i’th 
This  is  shown  clearly  by  writing  Eq.  (A-12)  as  A  U  =  U  D. 
similarity  transformation,  we  may  rewrite  A  as 


is  a  diagonal  matrix 
the  eigenvector 
eigenvalue  of  A. 

With  this 


A  =  U  D  UT. 


(A-13) 


T  T 

Now  let  A  be  idempotent  as  well  as  symmetric,  then  A  A  =  U  D  D  UDU 
2  T 

=  U  D  U  .  But  since  this  must  also  equal  A  as  expressed  in  Eq.  (A-13) ,  we 
must  have 


D2  =  D.  (A- 14) 

Since  the  eigenvalues  of  a  symmetric  matrix  must  be  real,  the  diagonal  matrix 
D  must  contain  only  zeros  and  ones  for  Eq.  (A-14)  to  hold. 

To  see  that  this  eigenstructure  exemplifies  the  operation  of  the 

N 

projector,  consider  the  arbitrary  N-vector  z  in  the  space  R  ,  which  has  a  set 
of  basis  vectors  u^,  i=l,...,N.  We  may  choose  this  set  of  basis  vectors  to  be 
the  eigenvectors  of  the  NxN  matrix  P,  which  is  the  projector  onto  the  (say,  M- 
dimensional)  subspace  S.  We  may  now  write 

(A-15) 

and  z  =  a1  Uj^  +  a2  -2  +  ***  +  ~N*  (A-16) 

Thus  P  z  =  X-  a.,  +  X2  a2  u2  +  •••  +  X^  a^  uN-  (A-17) 


P  u.  =  X.  u.  . 

“1  X  “1 
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M  of  the  N  eigenvalues,  those  corresponding  to  the  eigenvectors  that  span 
the  subspace  S,  will  have  value  unity  while  the  remaining  eigenvalues  will 
have  a  value  of  zero.  We  therefore  have  the  result 

M 

P  2  =  Z  a.  u.  ,  (A-18) 

i=l  J i  J i 

where  the  denote  those  eigenvectors  which  span  S.  Thus,  components  of  z  in 
S  are  unchanged  while  components  in  S  are  annihilated. 

A. 2  The  Generalised  Inverse 

In  general,  an  NXM  matrix  A  is  a  linear  transformation  that  maps  an 
arbitrary  vector  x  from  an  M-dimensional  space  to  an  N-dimensional  space, 
which  is  the  range  (column  space)  of  the  mapping  (matrix) .  We  desire  an 
inverse  transformation  that  will  map  an  N-vector  y  lying  in  the  range  of  A 
back  into  an  M-dimensional  space.  If  the  vector  y  does  not  lie  in  the  range  of 
A,  then  the  inverse  mapping  must  first  approximate  y  with  a  suitable  vector  in 
the  range  of  the  mapping.  Let  us  first  consider  the  case  where  y  lies  in  the 
column  space  of  A  (consistent  equations) . 

For  the  NxM  matrix  A,  the  MXN  matrix  A+  is  a  generalized  inverse 
(g-inverse)  of  A  if  x  =  A+  y  is  a  solution  to  the  equation  A  x  =  y,  for  any  y 
that  makes  the  system  consistent  [13] .  Substituting  the  first  equation  into 
the  second,  we  get  A  A+  y  =  y.  Now  suppose  that  given  an  arbitrary  M-vector 
w,  we  let  y  =  A  w.  Since  this  y  is  generated  from  the  columns  of  A,  it 
clearly  lies  in  the  column  space  of  A.  Substituting  this  into  the  previous 
equation  yields  A  A+  A  w  =  A  w.  In  general,  this  requires  that  the 
generalized  inverse  satisfy 


A  Af  A  =  A.  (A-19) 

In  the  most  unrestricted  sense,  this  is  all  that  is  required  of  a 
g-inverse.  If  we  wish,  however,  to  consider  the  case  of  inconsistent 
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equations,  then  we  must  impose  further  restrictions  that  determine  how  we  wish 
to  approximate  x  before  transforming. 


A. 3  g-inverse  for  Linear  Least-Squares  Solution 

Geometrically,  the  best  approximant  to  x  that  makes  the  system  consistent 
is  the  projection  of  x  onto  the  column  space  of  A,  resulting  in  the  least- 
squares  solution  to  the  system  of  equations.  Thus,  for  the  arbitrary  N-vector 
z,  the  inconsistent  equation  A  x  -  z  can  be  made  consistent  by  premultiplying 
both  sides  by  the  projection  operator  for  the  column  space  of  A,  yielding 
A  x  =  z.  But  the  projection  of  the  columns  of  A  onto  themselves  leaves 
them  unaffected,  so  this  reduces  to 

A  x  =  P^  z  .  (A- 20) 

The  g-inverse  solution  for  x  is  then  x  =  A  P^  z.  Substituting  this  for  x  and 
noting  that  the  projection  operator  is  idempotent,  Eq.  (A- 20)  becomes 

+  Hh 

A  (A  P.  a)  =  (PA  PA)  z,  A  A  %  =  P^  From  this,  we  see  that  the 
generalized  inverse  for  solving  linear  least-squares  problems  must  be  such 
that  A  A  =  P^  is  the  projector  onto  the  column  space  of  A.  This  then 
requires  that  the  product  A  A  be  idempotent  and  symmetric.  That  this  product 
is  idempotent  is  equivalent  to  the  requirement  of  Eq.  (A-19) .  We  do  have  the 
further  restriction,  though,  that  A+  must  satisfy 

T 

(  A  A+  ]  =  A  A+  .  (A-21) 


A. 4  g-inverse  for  UiniauM  Nora  Solution  of  Consistent  Equations 

We  know  from  Section  A. 2  that  the  g-inverse  that  provides  the  solution  for 
the  consistent  system  of  equations 


A  x  =  x 


(A-22) 


must  satisfy  A  A+  A  =  A.  From  this,  it  follows  that  A  -  A  A+  A  =  0.  Thus 
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A  I  -  A  A 


)  =° 


(A-23) 


We  can  therefore  state  that  for  any  z, 


(  I  -  A*  A  ) 


is  a  solution  to  the  homogeneous  system  of  equations  A  x  =  0.  As  is  the  case 

for  a  linear  differential  equation,  the  general  solution  for  the  set  of 

simultaneous  linear  equations  in  Eq.  (A-22)  is  the  sum  of  the  homogeneous 

solution  and  a  particular  solution,  and  can  thus  be  written  x  =  x  +  x,  = 

P  " 

A+x+(i-A+a]z. 

If  we  denote  the  g-inverse  leading  to  the  minimum  norm  solution  as  A*, 
then  we  desire  to  have 


.+  2  , 
Am*  * 


*+  i  +  (  I  -  A+  A  ) 


(A-24) 


for  all  x  and  z*  The  right-hand  side  of  Eq.  (A-24)  can  be  written 


A+  I  +  (  I  -  A+  A]  2  ||2 

"  {  A+  I  ♦  (  I  -  A+  A  )  2  }  {  A+  i  ♦  (  I  -  A+  A  ]  2  }  , 


so  that,  since  all  terms  are  real,  Eq.  (A-24)  becomes 


This  is  a  minimum  when  the  middle  term  is  zero,  which  occurs  when  the 
particular  solution  is  orthogonal  to  the  homogeneous  solution.  In  general, 
this  requires  that 
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(A+)T  (  I  -  A+  A  )  =  0  , 

so  that  (A  )  =  (A  )  A  A.  For  this  to  be  true,  it  is  necessary  and 

sufficient  [13]  that 

A+  A  A+  =  A+  (A-25) 

T 

and  ^  A+  A  j  =  A+  A  .  (A-26) 

We  will  now  show  that  the  product  A+  A  is,  in  fact,  the  projector  onto  the 
row  space  of  A. 


A*  A=  [  A+  a) 


=  at  (a *y 


-  A1  (aV, 


which  is  the  projection  operator  onto  the  columns  of  A  ,  or  the  rows  of  A. 


In  summary,  the  g-inversc  for  obtaining  a  minimum  norm  solution  to  A  x  =  i 
must  be  such  that 


AP  =  P  -  =  A+  A  .  (A-27) 

Let  us  re-examine  the  general  solution,  now  written 

x  =  A+i+[i-ap]z, 

where  substituted  Eq.  (A-27)  into  the  homogeneous  solution.  Recall 

from  Section  A.l  that  (I  -  P)  is  the  projector  onto  the  orthogonal  complement 
of  the  subspace  for  which  P  is  the  projector.  Thus  we  see  that  the 
homogeneous  solution  is  confined  to  the  null  space;  i.e.,  the  orthogonal 
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complement  of  the  row  space  of  A.  Since  the  minimum  norm  solution  must  be 
orthogonal  to  this  homogeneous  solution,  what  we  really  are  striving  for  in 
the  minimum  norm  solution  is  that  solution  lying  in  the  row  space  of  the 
matrix  A. 

A. 5  g-inverse  for  Minimum  Nora  Least-Squares  Solution 

Combining  the  results  of  the  last  two  sections,  we  see  that  the 
g-inverse  for  obtaining  the  minimum  norm  least-squares  solution  [i.e.,  the 
Moore-Penrose  generalized  inverse  (pseudoinverse)]  must  be  such  that 

PA  =  A  A+ 

and  ^P  =  A+  A 

are,  respectively,  the  orthogonal  projectors  onto  the  column  space  and  the  row 
space  of  A.  This  is  equivalent  to  the  following  requirements: 

A  A+  A  =  A  A+  A  A+  =  A+ 

(  A  A*  )T  =  A  A+  (  A+  A  f  =  A*  A. 

It  is  interesting  to  note  that  in  forming  the  minimum  norm  linear  least- 
squares  solution,  we  are  actually  performing  a  three-stage  process.  Starting 
with  the  least-squares  problem 

A  X  s  y  (A-28) 

(where  A  is  not  necessarily  full  rank) ,  we  obtain  the  minimum  norm  solution 

x  =  A+  y.  (A-29) 

Stage  I:  Projection  of  y  onto  the  column  space  of  A  to  obtain  a  consistent 
set  of  equations.  We  form  y  =  A  A+  y  =  P^  y. 
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Stage  II:  Solution  to  the  consistent  set  of  equations  k  x  =  £  to  yield  the 
general  solution  x  =  A+  y  +  [I  -  A+  A]  z,  where  z  is  an  arbitrary  vector  in 


Stage  III:  Projection  of  x  onto  the  row  space  of  A  to  obtain  the  minimum  norm 
solution  (eliminate  the  homogeneous  part  of  the  solution) .  We  form 
x  =  A+  A  x  =  x  =  A+  y. 

We  now  conclude  this  portion  of  our  discussion  of  linear  least-squares 
theory.  The  remainder  of  the  chapter  will  examine  the  QR  factorization  family 
as  it  is  used  for  forming  projection  operators  and  solving  linear  least- 
squares  problems. 


A. 6  QR  Factorisation  of  Full  Rank  Matrices 

Consider  the  NXM  matrix  A  of  rank  r  =  MSN.  There  exists  an  NXN  orthogonal 
matrix  Q,  such  that 


where 


(A-30) 


is  square,  upper  triangular,  and  nonsingular.  With  this,  we  may  write 
T 

A  =  Q  R,  and  define  a  g-inverse  of  A  as 


A+  = 


Q  • 


(A— 31) 
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Recalling  that  =  A  A+, 


the  projection  operator  becomes 


(A-32) 


where  Iy  is  the  MXM  identity  matrix.  We  see  by  inspection  that  this  is 
symmetric,  and  by  squaring  we  see  that  it  is  idempotent 


where  we  have  noted  the  orthogonality  of  matrix  Q.  Thus  we  see  that  the 
g-inverse  defined  in  Eq.  (A--31)  is  adequate  for  forming  the  projection 
operator  onto  the  column  space  of  A. 


Now  recall  the  projection  operator  onto  the  row  space  of  A, 

AP  =  A+  A  • 

Substituting  Eq.  (A-31)  into  this  equation  yields 


Thus  this  factorization  satisfies  . le  requirements  for  the  Moore-Penrose 
g-inverse.  The  least-squares  functional  for  the  linear  least-squares  problem 
Ax  -  2  then  becomes 
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H*)  =  1 1  A  x  -  x  1 12 

=  Q  A  x  -  Q  y  1 12 


■  jr 

A' 

.  0  . 

x  - 

V 

i 

where  we  have  partitioned  Q  as 


The  solution  for  x  in  this  case  is  determined  uniquely  as 

4s  -  V1  «1  *  • 

leaving  a  residual  sum  of  squared  error 


^LS)  "  II  «2  *  IP  ' 


(A-33) 


(A-34) 


(A-35) 


A. 7  QR  Factorisation  of  Rank  Deficient  Matrices 


In  the  case  of  rank  deficient  matrices,  the  QR  factorization  does  not  lead 
to  a  g-inverse  that  satisfies  the  Moore-Penrose  conditions.  In  this  section, 
it  is  shown  that  a  truncated  version  of  the  QR  factorization  with  column 
pivoting  can,  however,  be  used  to  construct  a  g-inverse  suitable  for  forming 
the  projector  onto  the  column  space  of  A.  In  the  next  section,  the  complete 
orthogonal  factorization  will  be  presented,  which  solves  the  problem  of  rank 
degeneracy  and  leads  to  the  Moore-Penrose  pseudoinverse. 
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Consider  the  NXM  matrix  A  of  rank  r<M<N.  There  exists  an  NXN  orthogonal 
matrix  Q  and  an  MXM  permutation  matrix  S,  such  that 


Q  A  S 


R11 

R12 

.  0 

0  . 

(A-36) 


where 


By  truncating  R  (i.e., 


replacing  R^g  by  a  zero  matrix) , 


a  g-inverse  of  A  is 


For  this  factorization,  the  projector  onto  the  range  of  A  becomes 


(A-37) 


PA  =  Q 


=  Q 


R11 

R12  1 

0 

0 

J 

\h 

0  ' 

n 

.  0 

0  . 

H 

T 

S1  S 


hi  i 

0 

o  1 

0 

Q 


(A-38) 


which  (as  in  the  full  rank  case)  conforms  to  the  requirements  of  a  projection 
operator.  The  g-inverse  formed  from  the  truncated  QR  factorization  is 
therefor.;  suitable  for  forming  the  projector  onto  the  column  space  of  the 
basis  function  matrix. 


Let  us  now  form  the  product 


A+  A  =  S 


IL. 

li 

1  0  1 

1  Ion1 

r  R--  i 
11  1 

R., 

1Z 

c 

4  H 

0  J 

.  0 

0 

m 

sJ 


=  s 


R  111I  12 


0  I  0 


(A-39) 
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This  is  clearly  ncnsymmetric  and  therefore  does  not  constitute  an  orthogonal 
projector.  Thus,  the  g-inverse  formed  from  the  truncated  QR  factorization 
does  not  conform  to  the  requirements  for  the  pseudoinverse. 


A. 8  Complete  Orthogonal  Factorisation  of  Rank  Deficient  Matrices 

To  obtain  a  minimum  norm  least-squares  solution  in  the  rank  deficient 
case,  we  use  an  extension  of  the  QR  factorization,  the  complete  orthogonal 
factorization.  This  factorization  is  a  suitable  alternative  to  the  singular 
value  decomposition  for  performing  the  rank  reduction  necessary  to  obtain  the 
minimum  norm  linear  least-squares  solution. 

Again  consider  the  NXM  matrix  A  with  rank  r<M<N  and  the  orthogonal 
factorization 


Q  A  S  =  R  = 


R 


11 


12 


0 


where 


Rll  = 


0 


rXr 


There  exists  an  MXM  orthogonal  matrix  V  such  that 


Y=QASV=R  = 


Rn  I  0 


0  J 


(A-40) 


where 


v - 4-V.2~  ™ 


x x v  w  uuio * 


"X1A.  5C  ao 


A+  =  S  V 


R 

hi 

0 

0 

0  . 

(A-41) 
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Now  forming  the  projection  operator  P^,  we  obtain 


PA  =  A  A* 


=  ,T  [  ?nl2  ]  vtst  s  y  V 


0  I  0 


,T  r 


Q  -  Q 

0  0  J 


(A-42) 


which,  once  again,  is  seen  to  be  symmetric  and  idempotent.  If  we  now  attempt 
to  form  the  projector  onto  the  row  space  of  A,  we  get 

,P  =  A+  A 


=  SV  — - *—  Q  Q1  — 

0  0  J  0 


Ij.  ®  7  'P 

=  S  V  Y1Si  , 

.0  0  . 


VTST 


(A-43) 


which  is  also  symmetric  and  idempotent.  Thus  the  complete  orthogonal 
factorization  leads  to  a  g-inverse,  which  is  suitable  for  forming  both 
projection  operators.  This  g-inverse  therefore  satisfies  all  of  the 
requirements  for  the  Moore-Penrose  generalized  inverse. 


Using  the  g-inverse  constructed  from  the  complete  orthogonal 
factorization,  the  least-squares  functional  becomes 
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0(x)  =||  AX  -  X  || 

=  ||  «AS  -  Hi  )|2 

- 1|  uss1,.  n  ||2 
=  ||  QASVvVjc-d!  ||2 
=  ||  ivVj-d  ||2. 

T 

If  we  let  w  =  S  x  partition  Q  as 


and  partition  V  as 


T-kk]  - 

where  Vj  is  MXr  and  Vg  is  MX(M-r),  then  we  obtain  for  the  least-squares 
functional 


*11  1  0  1 

[i!l 

_V 

0  1  0  . 

1v2tJ 

w  - 

V 

i 

*n  vi  ■ 

A' 

| 

o 

V?  — 

«2J 

w  i 

*  1 
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From  this,  we  obtain  the  solution 


-1  =  V1  ^1  1  *  (A“44) 
We  obtain  a  final  minimum  norm  solution  as 

-LS  =  S  V1  Sll_1  ^1  1  »  (A“45) 

which,  as  in  the  full  rank  case,  leaves  a  residual  sum  of  squares 

=  II  %  *  II2  '  (A“46) 


A. 9  Near  Rank  Deficient  Matrices 

Consider  the  NxM  matrix  A  with  numerical  rank  p  =  M<N,  but  whose  expected 
(ideal  rank)  is  r£M.  There  exists  an  NXN  orthogonal  matrix  Q  and  an  MXM 
permutation  matrix  S,  such  that 


4  A  S  =  R  = 


(A-47) 


where 


Column  pivoting  at  each  stage  of  the  factorization  will  result  in  a  matrix 
that  can  be  further  partitioned  as 


where 
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and 


(M-r)X(M-r) 


If  A  were  truly  rank  deficient,  Rgg  would  consist  of  zeros.  But  because 
of  perturbations  in  A,  will  have  non-zero  elements.  If  the  perturbations 
are  small,  however,  then  the  elements  of  Rgg  should  also  be  small  so  that  the 
rank  deficiency  can  be  uncovered  when  | I ^22  ^ ^  becomes  muc^  smaller  than  MA||. 
Then  rank  reduction  can  be  achieved  by  setting  to  zero  and  solving  the 
remainder  of  the  problem  as  a  truly  rank  deficient  case. 


Golub  and  Van  Loan  [12]  point  out  that  there  are  cases  in  which  at  no  step 
during  the  orthogonalization  process  is  the  norm  of  1^2  very  small,  even 
though  the  original  matrix  is  rank  deficient.  But  they  also  go  on  to  say  that 
this  method  of  rank  determination  "works  well  in  practice."  The  reader  is 
referred  to  Section  6.4  of  Golub  and  Van  Loan  [12],  and  to  Golub,  Klema,  and 
Stewart  [14] . 
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APPENDIX  B  -  ORTHOGONAL  FACTORIZATION  BY  SUCCESSIVE  HOUSEHOLDER  TRANSFORMATION 

In  this  appendix,  we  describe  a  method  of  triangularizing  matrices  which 
uses  the  highly  stable  Householder  transformation,  also  called  an  elementary 
reflector  (see  Ref.  15).  This  transformation  introduces  zeros  into  a  column 
or  row  of  a  matrix,  depending  on  whether  it  is  applied  from  the  left 
(premultiplication)  or  from  the  right  (postmultiplication) .  In  the  former 
case,  we  premultiply  by  a  sequence  of  elementary  reflectors  to  transform  an 
NXM  matrix  (N>M)  into  an  upper  triangular  matrix  (or  upper  trapezoidal  matrix 
if  N<M) .  This  leads  to  the  (JR  factorization  defined  as  (J  A  =  R.  In  the 
latter  case,  we  transform  an  NXM  matrix  (N<M)  into  a  lower  triangular  matrix 
(or  lower  trapezoidal  if  N>M) .  This  can  lead  to  the  RY  factorization  defined 
as  A  V  -  R,  or  to  the  complete  orthogonal  factorization  as  discussed  in 
Appendix  A. 


B.l  Householder  Transformation  fro*  the  Left 

Given  a  full  rank  NXM  matrix  A,  where  N>M,  we  premultiply  A  with  asequence 
of  M  elementary  matrices;  i.e.,  we  form  Hjj  •••  H^  A  =  R,  to  yield  the  NXM 
upper  triangular  matrix  R.  The  H^  are  defined  as 


H.  = 

1 


) 


where  I,. 

(i 

Householder  reflector  matrix  constructed  to  introduce  zeros  below  the  i’th 
diagonal  element  of  A^  ^  =  H^_^  •••  H^  A.  For  example,  at  the  beginning  of 
the  fourth  stags  of  the  triangulari  Ion  process,  we  have  the  matrix 


_.q  is  an  (i-l)X(i-l)  identity  matrix  and  U^  is  an  (N-i+l)X (N-i+1) 
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a(l) 

all 
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a(3) 

a64 

a(3) 

a65 

0  0  0 

a(3) 

a6M 

•  o 

0 

0 

0 

• 

a(3) 

a74 

0 

a(3) 

a75 

0 
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a(3) 

a7M 

0 

• 

• 

0 

0 

0 

0 

0 

0 

0 

0 

0 

(3) 

0  0  0 

0 

0 

(3) 

“Ml 

Here  the  superscripts  on  the  elements  designate  the  number  of  previous 
Householder  reflectors  that  have  transformed  the  particular  element;  e.g.,  the 
elements  in  the  first  row  are  affected  only  by  the  first  Householder  reflector 
so  that  these  elements  have  the  superscript  (1) ,  the  elements  in  the  second 
row  are  affected  by  the  first  two  Householder  reflectors  so  that  these 
elements  have  the  superscript  (2) ,  and  the  nonzero  elements  below  the  second 
row  have  been  transformed  by  all  three  of  the  previous  reflectors  so  that 
these  elements  have  the  superscript  (3) . 

We  wish  to  introduce  zeros  below  the  first  element  of  the  vector 
designated  within  the  box.  If  we  denote  this  (N-i+1) -vector  as  x,  then  we  we 
wish  to  find  the  Householder  reflector,  U,  such  that 

U  x  =  -  a  e-,  (B-l) 
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The  Householder  reflector  that  achieves  this  is  defined  as 


TT  T  ft~  1  T 

U  =  I  -  p  u  u  , 

0-2) 

where 

a  =  sgn(xx)  | |x| |  , 

0-3) 

u^  =  x^  +  a  , 

0-4) 

Ui  =  Xi  '  i=2>  * ’ • »n  » 

0-3) 

and 

II 

e 
>— 4 

0-6) 

Since  only  the  vector  u.  and  the  scalar  are  necessary  for  forming  at 
each  stage,  all  information  concerning  the  construction  of  the  IL  can  be  saved 
by  storing  the  l?,st  N-i  elements  of  u^  below  the  i’th  diagonal  element  of  A 
and  storing  the  pre-transformation  value  of  the  diagonal  element  in  an 
auxilary  \ectcr  (note  that  the  post-transformation  value  of  the  diagonal 

element  is  0’^,  so  that  is  indirectly  available) . 

* 

When  applying  the  Householder  transformation  to  an  arbitrary  vector  jr  we 
use  the  equation 
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After  Householder  reflectors  have  been  constructed  for  all  M  columns,  the 
orthogonal  matrix  Q  is  defined  as 


(B-8) 

Since  the  are  elementary  matrices,  they  are  symmetric,  so  that  the 
transpose  of  is  just 


^  ~  \  ***  H1  ' 


(JT  =  H2  ^  •••  Hm  .  (B-9) 

We  can  therefore  multiply  by  the  transpose  of  Q  simply  by  reversing  the  order 
in  which  each  of  the  elementary  transformations  are  applied.  Thus  one  need 
never  explicitly  form  the  matrix  Q. 

The  stability  of  the  Householder  method  can  be  ensured  by  using  column 
pivoting  at  each  stage  of  the  factorization  to  bring  the  column  of  largest 
norm  to  the  pivot  position.  At  each  stage,  the  pivoting  causes  an  elementary 
matrix  to  be  factored  to  the  right;  i.e.,  each  column  swap  is  recorded  by 
postmultiplying  by  an  identity  matrix  with  the  same  two  columns  interchanged. 
At  the  i’th  step,  we  would  form  A^  =  ^  ^  S^,  where  is  an  elementary 

matrix  representing  the  i’th  column  pivot.  At  completion,  the  factorization 
appears  as  Q  A  S  =  Hy  Hg  Hj  A  Sg  •••  Sy  =  R. 


B.2  Householder  Transformation  from  the  Right 

Given  an  NXM  matrix  A  with  full  row  rank,  we  postmultiply  A  with  asequence 
of  elementary  matrices  to  yield  A  H  ^  ***  Hy  =  R.  Here,  the  are  formed 

in  exactly  the  same  way  as  when  transforming  from  the  left,  except  that  IT  is 
constructed  to  introduce  zeros  to  the  right  of  the  i’th  diagonal  element; 
i.e.,  into  the  i * th  row . 

To  parallel  our  previous  example,  at  the  i’th  stage  of  the 
triangularization  process,  we  have  the  matrix 
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J3) 

Here,  we  construct  the  Householder  reflector  to  introduce  zeros  to  the  right 
of  the  first  element  in  the  vector  designated  within  the  box.  Denoting  this 
(M-i+1) -vector  as  x  ,  we  wish  to  find  the  Householder  reflector  U,  such  that 

xT  U  =  -  a  e1T  ,  (B-10) 

where  a  and  e^  are  the  same  as  in  Section  B.l.  The  construction  of  U  is  also 
the  same  as  in  Eq*  (B,l). 

T 

In  applying  the  Householder  reflector  to  an  arbitrary  row  vector  £  ,  we 
use  the  equation 


IT  U  =  /  (  I  -  fl  u  UT  ) 

=  1  ~  P~X  (  1  'A  )  uT  •  (B— 11 

Row  pivoting  could  be  used  to  ensure  stability,  just  as  column  pivoting 
was  used  in  the  previous  section.  Usually,  however,  transformations  from  the 
right  are  applied  to  matrices  that  already  exhibit  special  structure  and  are 
expected  to  be  reasonably  conditioned,  so  that  row  pivoting  would  serve  only 
to  disturb  the  existing  structure. 
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After  Householder  reflectors  have  been  constructed  for  all  N  rows,  the 
orthogonal  matrix  V  is  defined  as 

V  =  E1  Hg  •••  Hm  .  (B— 12) 

Again,  since  the  are  symmetric,  the  transpose  of  Y  is 

VT  =  Hm  •••  Ibj  Hx  .  (B-13) 

Essentially,  applying  orthogonal  transformations  from  the  right  of  a 

matrix  is  the  same  as  applying  transformations  from  the  left  of  the  transpose 

T 

of  the  same  matrix,  and  then  transposing  the  entire  equation.  First,  Q  A  = 

R^,  where  Ry  denotes  an  upper  triangular  matrix.  Then,  transposing  yields 

XT  x  T 

A  Q  =  Ry  .  Now,  letting  V  =  Q  and  R^  =  Ru  (a  lower  triangular  matrix) ,  we 

get  our  desired  factorization. 
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APPENDIX  C  -  REVIEW  OF  LEAST-SQUARES  OPTIMIZATION  TECHNIQUES 

In  nonlinear  optimization  (also  see  Refs.  16  and  17)  we  begin  with  an 
initial  set  of  parameters  and,  through  a  succession  of  iterations,  update  the 
parameters  in  a  way  such  that  the  sequence  of  updated  parameters  hopefully 
converges  to  the  ideal  set.  We  go  about  this  by  assigning  a  measure  of 
closeness  between  a  model  of  the  physical  phenomenon,  which  is  a  function  of 
the  parameter  set,  and  the  experimentally  observed  values.  In  the  case  of 
least-squares,  we  use  as  that  measure  the  sum  of  the  squares  of  the  errors 
between  our  parameterized  model  and  the  observations;  i.e.,  we  assign  the  cost 
functional  (or  error  norm) 


2  2 

*(6)  =  ||  s(8)  ||  =  ||  x-a(§)  ||  ,  (C-i) 

where  8  is  the  set  of  parameters,  x  is  the  vector  of  observations,  and  x(8)  is 
the  parameterized  model.  We  then  go  about  minimizing  this  cost  functional  by 
appropriately  adjusting  the  parameter  vector,  8. 

In  this  appendix,  we  introduce  three  optimization  techniques  that  can  be 
used  to  minimize  the  least-squares  functional.  They  are  the  method  of 
steepest  descent,  Nekton’s  method,  and  the  Gauss-Newton  method.  We  first  will 
introduce  a  general  class  of  strategies  called  gradient  methods,  within  which 
the  three  methods  mentioned  fit.  The  steepest  descent  method  follows  directly 
from  the  acceptability  criteria  for  the  gradient  methods.  We  then  introduce 
Newton’s  method  as  a  second-order  Taylor  approximation  to  the  error 
functional,  and  finally  we  introduce  the  Gauss-Newton  method  as  a 
simplification  of  Newton’s  method. 


G.l  Gradient  Methods  of  Optimization 

At  each  iteration  the  adjustment,  or  update,  consists  of  a  direction  and  a 
step  size;  i.e.,  the  update  is 


A  =  p  d  , 


(C-2) 
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where  p  is  the  step  size  and  d  is  the  step  direction.  To  assure  that  this 
update  can  lead  to  a  decrease  in  the  cost  functional,  we  must  first  require 
that  the  step  direction  d  form  greater  than  a  90°  angle  with  the  gradient  (be 
downhill  on  the  contour  of  $)  at  the  current  iterate  of  parameters  8  . 

To  ascertain  this,  we  first  note  that  given  a  direction  d,  the  updated 
parameter  vector  is  solely  a  function  of  the  step  size;  i.e.,  8(/>)  = 

8^  +  p  d  .  The  cost  functional  along  this  direction  is  then 

*dffl0>)>  =  *{  EW  +  p  d  }  . 

Differentiating  this  with  repect  to  p  and  evaluating  at  p  =  0  yields  the 
directional  derivative  of  the  cost  functional  at  the  current  iterate;  i.e., 


a*d{0(/>)} 


9  P 


P  =  0 


Noting  the  chain  rule,  this  becomes 


9*d{0(/>)}  1 

T 

91( P)  I 

.  38  (p)  . 

.  9/7  J 

P  -*  o 


(0  ^ 

=  £  d  , 

where  is  the  gradient  of  $(8)  at  8^.  The  step  direction  d  is  then 

downhill  if  the  directional  derivative  is  negative;  i.e., 


V  =  £  S  <  0  , 


(C-3) 


where  we  have  dropped  the  iteration  index  on  the  gradient  vector.  One  way  to 


assure  this  is  to  let  the  step  direction  be 

d  =  -  g  , 


(C-4) 
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which  is  precisely  the  case  in  the  method  of  steepest  descent — so  called 
because  initially,  this  is  the  direction  in  which  the  cost  functional  descends 
most  rapidly.  Steepest  descent  is  well  known,  however,  for  its  slow 
convergence  due  to  a  zig-zag  pattern  along  troughs  in  the  cost  functional 
contour.  Therefore,  an  alternative  is  desirable. 

Another  way  to  assure  that  the  directional  derivative  is  negative  is  to 
find  a  positive  definite  matrix,  R,  and  let 

d  =  -  R  g  .  (0-5) 

An  optimization  technique  in  which  the  direction  is  so  chosen,  regardless  of 
whether  the  matrix  R  is  positive  definite,  is  called  a  gradient  method.  If 
the  matrix  R  is  strictly  positive  definite,  then  d  is  called  an  acceptable 
gradient  direction.  For  all  gradient  directions,  the  directional  derivative 
is  given  by  the  quadratic  form 

V  =  -  £T  £  •  (0-6) 


C.2  Newton’s  Method 

In  Newton’s  method,  we  choose  for  the  matrix  R  the  inverse  of  the  Hessian 
matrix  of  the  error  functional  at  the  current  iterate.  The  typical  element  of 
the  Hessian  matrix  is 


82$(0) 

80.30. 

J  i 


This  choice  for  the  matrix  R  arises  by  approximating  the  error  functional  with 
a  second-order  Taylor  polynomial 


and  by  optimizing  this  approximate  cost  functional  over  the  parameters;  i.e., 
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differentiating  with  respect  to  8  and  equating  the  result  to  zero  to  yield  the 
optimum  parameter  set 


If  the  Hessian  matrix  is  positive  definite,  then  its  inverse  is  also 
positive  definite  and  Newton’s  method  yields  an  acceptable  direction. 
Furthermore,  if  the  objective  functional  is  quadratic,  the  Taylor 
approximation  is  exact  and  Newton’s  method  will  converge  in  a  single 
iteration.  Even  if  the  error  surface  is  not  quadratic  but  is  nearly  so  (as  is 
often  the  case  in  the  neighborhood  of  the  ideal  parameters),  then  Newton’s 
method  offers  quadratic  convergence  without  the  need  for  a  line  search  to 
determine  step  size. 

C.3  The  Gradient  of  the  Least-Squares  Error  Nora 

At  this  point,  it  will  be  useful  to  obtain  an  expression  for  the  gradient 
of  the  least-squares  error  functional.  The  arbitrary  least-squares  functional 
can  be  written  as 


*(D  ■  (  s(8) )  sffl)  •  (o-8) 

A  typical  term  in  the  gradient  is 

«i  =  DiUT£)  • 

Thus 

=  ^(e7)  e  +  eT  D^)  ,  (C-8) 

g.  =  2  D.(eT)  e  ,  (C-10) 
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and  &i  =  2  eT  D.,(e)  ,  (C-ll) 

where  is  the  operator  that  performs  partial  differentiation  with  respect  to 
8^  and  where  we  have  dropped  the  iteration  index,  as  well  as  the  explicit 
dependence  of  e  on  8.  The  equality  of  the  last  three  expressions  follows  from 
the  fact  that  both  terms  in  Eq.  (C-9)  represent  the  same  scalar  product  (since 
the  error  vector  is  real).  In  further  derivations,  we  will  use  either 
Eq.  (C-10)  or  Eq.  (C-ll) ,  depending  on  which  is  more  convenient  for  the  given 
purpose. 


C.4  The  Gauss-Newton  Method 

If  we  differentiate  (C-ll)  with  respect  to  the  j’th  parameter,  we  get  the 
ij’th  component  of  the  Hessian  matrix  as 


H. .  =  2  D. (eT)  D. (e)  +2  eT  D?.(e)  , 

ij  l  '  2  ~  lj w 


(C-12) 


where  D..(e)  is  the  second  partial  derivative  of  the  error  vector  with  respect 
to  the  parameters  8^  and  8^,  or  the  second-order  sensitivity  derivative. 
(Similarly,  D^(e)  is  the  first-order  sensitivity  derivative.  The  names 
reflect  the  fact  that  these  quantities  measure  the  sensitivity  of  the 
parameterized  model  to  changes  in  the  parameters.)  If  we  assume  that  the 
error  vector  is  small  in  the  neighborhood  in  which  we  are  optimizing,  then  we 
can  neglect  the  term  involving  the  second-order  sensitivity  derivative  and 
approximate  the  Hessian  matrix  as 


N-.  =  2  D.(eT)  D.(e)  .  (C-13) 

The  equation  for  the  approximate  Newton  direction;  i.e.,  d  =  -N  ^g,  can  be 
rewritten  as  the  system  of  equations 

N  d  =  -  g.  (0-14) 


I 
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If  we  now  define  the  Jacobian  matrix  of  the  error  vector,  given  by 

J=  [  D-(e  )  |  D2Ce)  |  •••  |  DK(g)  ]  , 

then  we  can  rewrite  the  gradient,  from  Eq.  (C-10) ,  as 

g  =  2  JT  e  ,  (G— 15) 

and  we  can  write  the  Gauss  approximation  to  the  Hessian  matrix  as 

N  =  2  JT  J  .  (C-16) 

Note  that  the  matrix  N  is  always  at  least  positive  semidef inite,  so  that 
the  concerns  of  the  Hessian  being  negative  definite  or  indefinite  are 
alleviated  with  this  approximation.  Substituting  Eqs.  (C-15)  and  (0-16)  into 
Eq.  (C-14) ,  the  system  of  equations  for  the  Gauss-Newton  direction  is  then 

JT  J  d  =  -  JT  e  .  (C— 17) 

But  this  is  just  the  set  of  normal  equations  for  the  linear  least-squares 
problem  in  which  the  error  vector  is  projected  onto  the  columns  of  the 
Jacobian  matrix ;  i . e . , 

J  d  a  e  .  (0-18) 

The  solution  for  this  problem  is 

d  =+-  J  e  ,  (0-19) 

where  J+  is  the  pseudoinverse  of  the  Jacobian  matrix.  Thus  the  Gauss-Newton 
method  can  be  viewed  as  a  sequence  of  linear  approximations  to  the  cost 
functional.  Note  that,  unlike  Newton’s  method,  this  method  does  require  a 
line  search  to  find  an  optimum  step  size  p. 
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While  there  are  several  other  popular  optimization  techniques  for  least- 
squares,  including  a  number  of  variations  on  Newton’s  method  that  use  other 
methods  to  ensure  the  positive  definiteness  of  the  Hessian  matrix  (e.g.,  the 
Marquardt  method,  the  Greenstadt  method,  etc.),  we  will  confine  our  attention 
to  the  three  methods  mentioned. 
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APPENDIX  D  -  DERIVATIVES  OF  PROJECTORS  AND  PSEUDOINVERSES 

In  this  appendix,  we  parallel  Golub  and  Pereyra  [1]  in  deriving  the 
derivative  of  the  projection  operators  (both  onto  the  column  space  of  a  matrix 
and  onto  the  row  space  of  a  matrix)  and  the  derivative  of  the  pseudoinverse. 

As  noted  in  Chapter  5  of  the  text,  we  retain  the  index  of  differentiation  in 
all  derivations  so  that  the  final  results  are  expressions  for  partial 
derivatives . 


D.l  The  Derivative  of  the  Projection  Operator 

To  calculate  the  derivatives  of  the  variable  projection  functionals,  we 
need  a  method  for  calculating  the  derivative  of  the  projection  operators 
D^(P^)  and  D^(AP).  We  proceed  first  in  evaluating  D^(P^)  by  noting  that  P^  is 
idempotent,  so  that 


W  ■  Di<PA  PA> 

-  W  PA  +  PA  W  •  t®'1) 

Thus  the  problem  becomes  one  of  evaluating  the  two  terms  on  the  right-hand 
side  of  Eq.  (D-l) .  We  start  with  the  first  term  by  noting  that  A  =  P^  A  so 
that 


Di « -  Miyo 

■  W  A  *  PA  ' 


Rearranging  yields 


Di(PA)  A  =  D.(A)  -  PA  D.(A) 
=  p/  D.(A). 


(D-2) 


CD-3) 
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+ 

Postmultiplying  both  sides  of  this  equation  by  A  and  noting,  on  the  far  left 
hand  side,  that  A  A+  =  P^,  we  obtain  the  first  of  the  desired  expressions, 

l 

W  PA  "  PA  C(A>  A  •  OM) 

We  proceed  with  the  second  term  by  noting  that  the  projection  operator,  and 
hence  its  partial  derivative,  is  symmetric,  so  that 

( B.ay  pA  f  =  pA  »a(pa)  . 

Substituting  Bq.  (D— 4)  into  the  above  equation  yields  the  desired  result, 

pa  W  ■  ( V  Di«  A+  )* 

-  (A*)T  D.  (A1)  p/  .  (D-S) 

Substituting  Eqs.  (D— 4)  and  (D-5)  back  into  Eq.  (D-l)  yields  the  partial 
derivative  of  the  projection  operator,  given  by 

W  *  *k  Di(A)  A+  +  (a+)t  Di(AT)  h  •  (°-6) 

Since  the  row  space  of  a  matrix  is  the  same  as  the  column  space  of  the 
transpose  of  the  matrix,  to  obtain  an  expression  for  the  derivative  of  the  row 
space  projector,  we  merely  replace  A  by  its  transpose  everywhere  in  Eq.  (D-6) , 
which  yields 

»1<AP> 


yo-tA1)  (a+)t  .  (yp^A1)  <av  ) 


1  1  ^ 
A+  D. (A)  .P  +  f  A+  D. (A)  .P  1  . 


(D-7) 
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D.2  The  Derivative  of  the  Pseudoinverse 

We  now  provide  an  expression  for  the  partial  derivative  of  the 
pseudoinverse  of  a  matrix,  which  is  necessary  for  deriving  the  Hessian  matrix 
of  the  YPF.  We  begin  by  noting  that 


1  ■  PA  +  PAi  • 


so  that 


D.(A*)  =  D.(A+) 


PA  -  D.(A+) 


CD-8) 


We  now  wish  to  find  expressions  for  both  terms  on  the  right-hand  side  of 
Eq.  (D-8) .  Looking  at  the  first  of  these  two  terms,  we  begin  by  recalling 
that  A+  =  A+  A  A+  so  that  D.  (A+)  =  D.  (A+)  P.  +  A+  D.  (A)  A+  +  .P  D.  (A+) . 

1  X  A  1  AX 

Rearranging  yields 


Di(A+)  PA  =  Apl  VA+)  “  A+  VA>  A+‘  (°"9) 

We  now  need  a  manageable  expression  for  the  first  term  on  the  right-hand  side 
of  this  expression.  To  obtain  this,  we  again  start  with 

A+  =  A+  A  A+ 


Then  D.(A+)  =  D.y?)  A+  +  AP  D.(A+)  .  (D-10) 

Rearranging  yields 


I 

jf  Di(A+)  =  D.(aP)  A+  .  (D-ll) 
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Substituting  Eq.  (D-7)  into  Eq.  (D-10) ,  -we  obtain 


P1  D.  (A+)  =  A*  D1(A)  y  A*  *  (  A+  /  )  A 
/  A+  =  (  I  -  A*  A  ]  A* 


.1  !T  .. 


But 


=  A+-  A+  A  A+  =  0  , 


se  «  get  /  Di(A+)  -  (  A+  D.(A)  /  )‘  A*  .  &-W 

Transposing  on  the  right  and  substituting  this  back  into  Eq.  (D-9)  yields 


1  1T  .♦ 


1 


(D-13) 
We  proceed  by 


Di  (A+)  PA  =  AP  Di(A^)  (AT  A+  -  A+  D±(A>  A  . 

We  now  turn  our  attention  to  the  second  term  in  Eq.  (D-8) . 
writing 

A+  =  A+  A  A+ 


=  A+  PA  , 

so  tut  Di(A+)  =  D.(A+)  Pa  .  A+  \(Vf)  ■  (D_14) 

Rearranging  and  substituting  Eq.  (D-6)  in  the  rightmost  term,  we  obtain 


B.(A*)  p/  =  A+  p/  B.(A)  A+  +  A+  [  p/  i-.(A)  A*  )*• 


1 


(D— 15) 
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But  A+  P^  =  A+  (  I  -  A+  A  ) 

=  A+  -  A+  A  A+  =  0  , 

so  we  get  D^(A+)  ^A  =  D^(A^)  P^  . 

Finally,  we  obtain  the  partial  derivative  of  the  p-  ;adoinverse  by 
substituting  Eqs.  (D-13)  and  (D-lfi)  back  into  Eq.  (D-8) ,  ^hich  yields 

Di(A+)  =  y  D.(AT)  (A+)T  A+  -  A+  D.(A)  A+  +  A+  (A+)T  D.(AT)  p/. 


(D-16) 


(D— 17) 
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APPEND IX  E  -  FRECHBT  DERIVATIVES  AND  A  SIMPLIFIED  TENSOR  NOTATION 

Here  we  examine  the  simplified  tensor  notation  of  Golub  and  Pereyra  [1] 
and  show  the  difficulty  in  carrying  this  notation  on  to  higher  order 
derivatives.  First,  we  define  the  Frechet  derivative  of  the  basis  function 
matrix,  which  is  a  three-dimensional  array  consisting  of  the  partial 
derivatives  of  the  basis  function  matrix  with  respect  to  each  of  the  signal 
pole  parameters.  Consider  the  example  case  of  a  model  with  two  conjugate  pole 
pairs.  There  are  then  four  signal  pole  parameters,  and  the  Frechet  derivative 
of  the  basis  function  matrix  [denoted  by  D(F)j  will  appear  as 


Multiplication  of  this  tensor  with  a  matrix  is  achieved  by  multiplying  each 
partial  derivative  matrix  with  the  multiplicand  matrix,  yielding  another 
three-dimensional  array  as  illustrated  below. 
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Multiplication  of  the  tensor  with  a  vector  results  in  what  we  will  call  a 
"degenerate  tensor  of  valence  two."  There  are  two  cases  in  which  this  occurs. 
When  the  valence  three  tensor  premultiplies  a  column  vector,  the  result 
appears  as  follows. 


When  a  row  vector  premultiplies  the  valence  three  tensor,  the  result  appears 
as  follows. 


If  we  pre-  and  post-multiply  the  valence  three  tensor  with  a  row  vector  and  a 
column  vector,  respectively,  then  we  obtain  what  we  will  call  a  "degenerate 
tensor  of  valence  one."  This  result  appears  as  follows. 
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That  the  results  in  the  last  three  cases  are  not  vectors  and  matrices  as 
we  normally  think  of  them  should  be  clear  from  the  illustrations.  In  the  last 
case  in  particular,  this  valence  one  tensor  is  neither  a  row  vector  nor  a 
column  vector  in  the  usual  sense;  i.e.,  in  the  plane  of  the  paper.  This  last 
case  precisely  describes  the  gradient  vector. 

The  real  problem  occurs,  however,  when  one  wishes  to  form  the  Hessian 
matrix  by  again  differentiating  the  gradient  vector  with  respect  to  the 
parameter  vector.  When  we  performed  the  differentiation  above,  the  partial 
derivatives  were  lined  up  in  a  third  dimension  that  was  not  along  a  column  or 
a  row.  Now  we  wish  to  differentiate  a  valence  one  tensor  whose  elements 
already  lie  along  this  third  dimension,  so  that  we  must  now  line  up  the  second 
partial  derivatives  along  some  fourth  dimension.  Additional  notation  will,  at 
this  point,  be  necessary  in  order  to  distinguish  which  dimension  we  are 
dealing  with.  This  problem  becomes  critical  when  attempting  to  multiply  two 
of  these  valence  three  tensors  (one  in  the  third  dimension  and  one  In  the 
fourth  dimension ),  as  occurs  when  forming  the  Hessian.  One  could  introduce 
notation  to  keep  track  of  these  different  dimensions,  or  one  could  abandon 
this  now  not-so-simplified  notation  and  adopt  the  full  tensor  notation.  Or 
one  could  simply  treat  each  of  the  derivatives  as  a  group  of  partial 
derivatives,  retaining  the  indexing,  and  ignore  the  specifics  of  the  higher 
order  vector  calculus,  as  we  have  chosen  to  do. 
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APPENDIX  F  -  PROOF  THAT  OPTIMIZATION  OF  THE  VARIABLE  PROJECTION  FUNCTIONS 
LEADS  TO  THE  SAME  MINIMA  AS  OPTIMIZATION  OF  THE  LEAST-SQUARES  FUNCTIONAL 

In  this  appendix,  we  expound  upon  the  proof  by  Golub  and  Pereyra  [1]  that 
sequential  optimization  of  the  signal  poles  and  amplitudes  via  the  variable 
projection  functional  leads  to  the  same  optimum  values  as  does  the 
simultaneous  optimization  of  the  parameters  via  the  least-squares  functional. 
For  simplicity  in  the  proof,  we  will  follow  the  notation  of  Golub  and  Pereyra 
and  introduce  the  Frechet  derivative  D(F),  which  is  an  array  of  partial 
derivative  matrices.  This  derivative  is  described  further  in  Appendix  E. 

Recall  the  least-squares  cost  functional  and  error  vector,  given  by 

2 

^(£>3)  =  ||  e(0, a)  || 
and  e(0,a)  =  y  -  F(0)  a  . 

Let  us  now  define  the  Jacobian  matrix  of  the  least-squares  error  vector, 


J£,a  ~  [  J0  »  Ja  ]  ’ 

(F-l) 

where 

f  1 

Je  -  t  "  "  D  a  » 

-  301  k  1 

(F-2) 

and 

3e 

Ja  *  ,  T  -  -  F®  ' 

(P-3) 

The  giadient  vector  of  the  least-squares  functional  can  then  be  defined  as 


(F-4) 
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If  we  now  let 


=  F+  i 


(F-5) 


then  the  gradient  of  the  least-squares  functional  becomes 

T 

V*(£  ,a*)  =  -  2  (  i  -  F  a*  )  [  D(F)  a*  ,  F  ] 

T 

=  -  2  (  i  -  F  F+  i  )  [  D(F)  F+  i  ,  F  ] 

=  -  2  iT  Pp1  [  D(F)  F+  i  ,  F  ] 

=  -  2  {  [  iT  Pp1  D(F)  F+  i  ]  .  [  xT  Pp1  F  ]  }  .  (F-6) 

1 

But,  since  Pp  F  =  0  , 

the  gradient  of  the  least-squares  functional  becomes 

?*(a,£)  =  -  2  {  [  XT  Pp1  D(F)  F+  2  ]  ,  0  }  .  (F-7) 

Now  recall  the  signal  basis  VPF, 


*(£) 


1 

PF  I 


The  grsidient  of  this  functional  is 


mi) 


l 


PF 


D(F)  F+  i  . 
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If  we  assume  that  0*  is  a  critical  point  of  f (0) ,  then 

VjK£*)  =  2  XT  Pp1  D(F)  F+  I  =  0  . 

*  sic  j|t  t  , 

Therefore,  for  0  and  a  as  defined  above,  the  gradient  of  the  least-squares 
functional  becomes 

)  -  [  o  ,  o  ]  , 

which  proves  that  a  critical  point  of  f(0)  is  a  critical  point  of  y(0,a)  when 
a  is  defined  as  above.  We  will  now  prove  that  a  global  minimizer  of  ^(0)  is 
also  a  global  minimizer  of  ^(£,a). 

For  any  given  9,  the  solution  of  the  least-squares  problem  becomes  a 
linear  problem,  which  is  straightforwardly  minimized  by  letting  a*  =  F+  y. 
Therefore,  for  any  given  0,  j>(0)  i  y(0,a ).  Now,  if  we  assume  that  9*  is  a 
globr.l  minimizer  of  jf(0)  and  a  is  defined  as  in  Eq.  (F-5) ,  then  certainly 
X(9  >a  )  =  j>(0  ).  Now  assume  that  there  exist  0  and  a  such  that  y(0,a)  < 
*(£*>2*).  Since  | >(0)  £  y(£,a),  this  requires  that  j>(0)  <  y(0,a)  £  ^(£*,a*)  = 
jK£*) .  But  since  0*  is  the  global  minimizer  of  #(0),  we  must  have  equality  on 
all  counts.  Therefore,  a  global  minimizer  of  ^(0)  is  also  a  global  minimizer 
of  ^(£,a)  over  0.  The  converse  [i.e.,  that  a  global  minimizer  of  J^(0,a)  over 
£  is  also  a  global  minimizer  of  ^(£)]  is  also  proven  by  Golub  and  Pereyra  [1] , 
following  a  similar  argument  as  above. 
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Subj :  CHANGE  TO  NRL  MEMORANDUM  REPORT  6643 

Ref:  (a)  NRL  Memorandum  Report  6643  of  27  Dec  1990 


1.  Delete  reference  4  in  reference  (a). 

2.  Please  call  Irene  Gonzalez  (407-857-5131)  or  Lu’Anne  Jevnager 
(407-857-5237) ,  if.  you  have  any  questions  concerning  this  matter. 
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